Understanding I/O In the zkVM
In the Hello World Tutorial, we had a brief introduction to how to perform I/O operations in the zkVM. Now we'll dive deeper into the subject. Keep reading to learn more about:
- What are the different types of data in the zkVM
- How to handle inputs and outputs
- Best practices for handling I/O in the zkVM
Setting the Stage
We can think of programming in the zkVM as transitioning between two worlds; The host, where computation works the same as in any other regular program, and the guest, where computation is done in a zero-knowledge environment.
Since the guest works in a zk environment, it has access to a limited set of
ways to get data if compared to the host.
The only way to send data between the host and guest is through the
Executor Environment.
Such data transfer is done through file descriptors.
The zkVM specify four default file descriptors, stdin,
stdout, stderr, and journal;
They're defined in the fileno module.
The zkVM Data Model
The zkVM has a data model that distinguishes between public and private data. By "public" we mean data that is included in the journal and becomes part of the proof, while "private" data is only accessible by the host and guest.
If your application handles sensitive data, it's important to be aware of
specifically which data is commited to the journal, avoiding
any sensitive data to be included in the proof.
Sending Data from the Host to the Guest
The stdin file descriptor is used to send input data from the host to
the guest.
In the host, it's possible to set the input data in the Executor
Environment through the methods
write and
write_slice.
The guest has corresponding functions read and
read_slice to read the input data.
Writing to the guest's stdin can be done as simply as the code below. For a real example, check the Voting Machine's example.
use risc0_zkvm::ExecutorEnv;
let input = "Hello, guest!";
let env = ExecutorEnv::builder().write(&input)?.build()?;
Since we mentioned the read/write methods and their _slice variants, let's
take a moment to understand the difference between them.
A Note on Performance
During the process of sending data from host to guest and vice-versa, we can
either do so while (de)serializing the data or by sending raw bytes.
It's a trade-off between convenience and performance.
By using the standard functions read,
write and commit (that we'll
cover in the next section), the zkVM performs automatic (de)serialization of the
data.
This enables easy handling of complex data structures, but it comes with a performance cost.
Using the _slice variants, on the other hand, allows for sending raw bytes,
which is faster but usually requires less ergonomic code.
It is good practice to use the standard functions first, switching to the
_slice variants when performance becomes an issue or when optimizing the code
to save on cycles.
Since both approaches can be used concomitantly, moving from one to another
shouldn't be a problem.
We have a more detailed explanation on guest code
optimization if you want to learn more about this topic.
Sending Private Data from the Guest
Back where we were, after getting data from the host and performing some
transformations on it, we might want to send private data back.
Both stdout and stderr file descriptor are used to send
data from the guest to the host in a private manner, and a convenient way to
send data to the host's stdout is by using the
write method.
Writing to the host's stdout can be done as simply as the code below. For a real example, check the Voting Machine's example.
let data = "Hello, host!";
env::write(&data);
On the host side, it's possible to read data coming from the guest by reading
the buffer that was originally passed to the Executor
Environment through its methods
stdout and
stderr.
The private data alluded to here is not included in the proof, but it is accessible to the host. This means that the party generating the proof can access the data, so you should take this into consideration. If you don't want to let private data leak to any other party, it's possible to achieve full secrecy by proving locally.
A good practice to handle sensitive data is to use proof composition; Essentially splitting the proving process into smaller parts, proving the sensitive data locally and combining the larger program later through composition in a capable proving service like Bonsai to speed up the proof generation.
Sending Public Data from the Guest
We saw how to send private data directly to the host, but we might also want
to commit public data, attesting to some fact that we
want to share with the world.
We can do so by sending this data to the journal file descriptor.
This data will be included in the proof and can be accessed by any party through
the Receipt after the proving process.
Writing to the journal is done through the methods
commit and commit_slice.
Writing to the journal can be done as simply as the code below. For a real example, check the Voting Machine's example.
let data = "Hello, journal!";
env::commit(&data);
On the host side, (or any other regular program that has access to the
Receipt), reading from the journal can be achieved by simply
calling the Journal's method
decode.
Reading Private Data in the Host
Once we sent data from the guest, we can read it back in the host by leveraging
the from_slice method. This method is used to deserialize
the data from a buffer into the desired type.
Reading from the host's stdout can be done as simply as the code below. For a real example, check the Voting Machine's example.
let result: Type = from_slice(&output)?;
If data was sent in its raw form by using a _slice variant, you'll need to
handle the bit fiddling manually.
Reading Public data in the host
Reading public data is done by accessing the Journal that
is contained in the resulting Receipt after the proving
process.
This can be done by calling the decode method on the
journal instance.
// Produce a receipt by proving the specified ELF binary.
let receipt = prover.prove(env, ELF).unwrap().receipt;
// Decode the journal to access the public data.
let public_data = receipt.journal.decode()?;
Sharing Data Structures Between Host and Guest
A good pattern to follow when handling shared data structures between the host
and guest is to have a common core module that contains the shared data
structures.
This way, both host and guest can import common data structures and consume them
as needed.
A good example of this pattern being used is the JWT
Validator.
In its core module, it defines common structures that will
be later used in the host and
guest modules.
Similarly, the Chess example does the same with its
core being used by the host and
guest.
Other examples leveraging this pattern can be found in the examples page.
Putting It All Together
Now that we've covered some details about I/O in the zkVM, let's see how a real program implements it in practice.
We'll cover the Voting Machine example. This example is a simple voting machine that allows users to vote for a candidate. We'll link to relevant parts of the code as we go along, and it's expected that you open the linked files in a separate tab to follow along.
The program is a state machine that supports three operations:
Init: Configures initial stateSubmit: Which allows a user to submit a voteFreeze: Which reveals the result of the election and closes the voting
First, we can see that all common data structures are defined in the
core module.
The host has functions for each of the
operations, and on each of them some input is sent to the guest.
In the submit and
freeze functions the host also passes a
buffer to the guest to be filled with the result of the operation, but we'll
get there in time.
Analyzing the init function first, we can
see that the host simply sends the initial state to the guest using the
write method.
Such data is then read by the
init guest program and immediately
commited to the journal.
Note how easy it is to operate on data structures when using the standard
read and commit functions, no bit
manipulation needed. It'd be a different story if we were using the _slice
variants. Since we don't have to worry about performance critical code here, we
can safely use the standard functions.
Moving on to the submit function, we can
see that in the host an output
buffer is passed to the
stdout file descriptor of
the guest.
It'll be filled with values produced by the guest and then read by calling the
from_slice method on the buffer.
This can be seen in this
line.
The result that was filled in the buffer came from the
write method call in the
guest.
Remember, the write method is used to send data to the
host's stdout file descriptor.
Still in the submit function, note how
the private output from the guest is used, and how it's relevant to the
distinction between public and private data in this case.
In the example presented, the VotingMachineState struct is changed during the
guest's execution. But we don't want to commit (make public) the state of the
voting machine, so we use the stdout file descriptor to send the result back
to the host.
This way, we can update the voting machine state at each iteration while
preserving its privacy.
Finally, in the freeze function, the
same patterns of sending and receiving data are repeated.
Conclusions
In this guide, we've covered the basics of I/O in the zkVM. We've seen how to
send data from the host to the guest and vice-versa, how private and public
data are distinguished, and how to commit data to the journal.
We also covered the trade-offs between using the standard functions and their
_slice variants and showed through the Voting Machine
example how to implement I/O in practice.
There are more examples available in the examples page that you can use as
reference if you wish.
Happy coding!