Understanding I/O in the zkVM
In the Hello World Tutorial, we had a brief introduction to how to perform I/O operations in the zkVM. Now we'll dive deeper into the subject. Keep reading to learn more about:
- What are the different types of data in the zkVM
- How to handle inputs and outputs
- Best practices for handling I/O in the zkVM
Setting the Stage
We can think of programming in the zkVM as transitioning between two worlds; The host, where computation works the same as in any other regular program, and the guest, where computation is done in a zero-knowledge environment.
Since the guest works in a zk environment, it has access to a limited set of
ways to get data if compared to the host.
The only way to send data between the host and guest is through the
Executor Environment.
Such data transfer is done through file descriptors.
The zkVM specify four default file descriptors, stdin
,
stdout
, stderr
, and journal
;
They're defined in the fileno module.
The zkVM Data Model
The zkVM has a data model that distinguishes between public and private data. By "public" we mean data that is included in the journal and becomes part of the proof, while "private" data is only accessible by the host and guest.
If your application handles sensitive data, it's important to be aware of
specifically which data is commit
ed to the journal, avoiding
any sensitive data to be included in the proof.
Sending data from the host to the guest
The stdin
file descriptor is used to send input data from the host to
the guest.
In the host, it's possible to set the input data in the Executor
Environment through the methods
write
and
write_slice
.
The guest has corresponding functions read
and
read_slice
to read the input data.
Writing to the guest's stdin can be done as simply as the code below. For a real example, check the Voting Machine's example.
use risc0_zkvm::ExecutorEnv;
let input = "Hello, guest!";
let env = ExecutorEnv::builder().write(&input)?.build()?;
Since we mentioned the read
/write
methods and their _slice
variants, let's
take a moment to understand the difference between them.
A note on performance
During the process of sending data from host to guest and vice-versa, we can
either do so while (de)serializing the data or by sending raw bytes.
It's a trade-off between convenience and performance.
By using the standard functions read
,
write
and commit
(that we'll
cover in the next section), the zkVM performs automatic (de)serialization of the
data.
This enables easy handling of complex data structures, but it comes with a performance cost.
Using the _slice
variants, on the other hand, allows for sending raw bytes,
which is faster but usually requires less ergonomic code.
It is good practice to use the standard functions first, switching to the
_slice
variants when performance becomes an issue or when optimizing the code
to save on cycles.
Since both approaches can be used concomitantly, moving from one to another
shouldn't be a problem.
We have a more detailed explanation on guest code
optimization if you want to learn more about this topic.
Sending Private data from the guest
Back where we were, after getting data from the host and performing some
transformations on it, we might want to send private data back.
Both stdout
and stderr
file descriptor are used to send
data from the guest to the host in a private manner, and a convenient way to
send data to the host's stdout
is by using the
write
method.
Writing to the host's stdout can be done as simply as the code below. For a real example, check the Voting Machine's example.
let data = "Hello, host!";
env::write(&data);
On the host side, it's possible to read data coming from the guest by reading
the buffer that was originally passed to the Executor
Environment through its methods
stdout
and
stderr
.
The private data alluded to here is not included in the proof, but it is accessible to the host. This means that the party generating the proof can access the data, so you should take this into consideration. If you don't want to let private data leak to any other party, it's possible to achieve full secrecy by proving locally.
A good practice to handle sensitive data is to use proof composition; Essentially splitting the proving process into smaller parts, proving the sensitive data locally and combining the larger program later through composition in a capable proving service like Bonsai to speed up the proof generation.
Sending Public data from the guest
We saw how to send private data directly to the host, but we might also want
to commit
public data, attesting to some fact that we
want to share with the world.
We can do so by sending this data to the journal
file descriptor.
This data will be included in the proof and can be accessed by any party through
the Receipt
after the proving process.
Writing to the journal is done through the methods
commit
and commit_slice
.
Writing to the journal can be done as simply as the code below. For a real example, check the Voting Machine's example.
let data = "Hello, journal!";
env::commit(&data);
On the host side, (or any other regular program that has access to the
Receipt
), reading from the journal can be achieved by simply
calling the Journal
's method
decode
.
Reading Private data in the host
Once we sent data from the guest, we can read it back in the host by leveraging
the from_slice
method. This method is used to deserialize
the data from a buffer into the desired type.
Reading from the host's stdout can be done as simply as the code below. For a real example, check the Voting Machine's example.
let result: Type = from_slice(&output)?;
If data was sent in its raw form by using a _slice
variant, you'll need to
handle the bit fiddling manually.
Reading Public data in the host
Reading public data is done by accessing the Journal
that
is contained in the resulting Receipt
after the proving
process.
This can be done by calling the decode
method on the
journal instance.
// Produce a receipt by proving the specified ELF binary.
let receipt = prover.prove(env, ELF).unwrap().receipt;
// Decode the journal to access the public data.
let public_data = receipt.journal.decode()?;
Sharing data structures between host and guest
A good pattern to follow when handling shared data structures between the host
and guest is to have a common core
module that contains the shared data
structures.
This way, both host and guest can import common data structures and consume them
as needed.
A good example of this pattern being used is the JWT
Validator.
In its core
module, it defines common structures that will
be later used in the host
and
guest
modules.
Similarly, the Chess example does the same with its
core
being used by the host
and
guest
.
Other examples leveraging this pattern can be found in the examples page.
Putting it all together
Now that we've covered some details about I/O in the zkVM, let's see how a real program implements it in practice.
We'll cover the Voting Machine example. This example is a simple voting machine that allows users to vote for a candidate. We'll link to relevant parts of the code as we go along, and it's expected that you open the linked files in a separate tab to follow along.
The program is a state machine that supports three operations:
Init
: Configures initial stateSubmit
: Which allows a user to submit a voteFreeze
: Which reveals the result of the election and closes the voting
First, we can see that all common data structures are defined in the
core
module.
The host
has functions for each of the
operations, and on each of them some input is sent to the guest.
In the submit
and
freeze
functions the host also passes a
buffer to the guest to be filled with the result of the operation, but we'll
get there in time.
Analyzing the init
function first, we can
see that the host simply sends the initial state to the guest using the
write
method.
Such data is then read
by the
init
guest program and immediately
commit
ed to the journal.
Note how easy it is to operate on data structures when using the standard
read
and commit
functions, no bit
manipulation needed. It'd be a different story if we were using the _slice
variants. Since we don't have to worry about performance critical code here, we
can safely use the standard functions.
Moving on to the submit
function, we can
see that in the host an output
buffer is passed to the
stdout
file descriptor of
the guest.
It'll be filled with values produced by the guest and then read by calling the
from_slice
method on the buffer.
This can be seen in this
line.
The result that was filled in the buffer came from the
write
method call in the
guest.
Remember, the write
method is used to send data to the
host's stdout
file descriptor.
Still in the submit
function, note how
the private output from the guest is used, and how it's relevant the
distinction between public and private data in this case.
In the example presented, the VotingMachineState
struct is changed during the
guest's execution. But we don't want to commit (make public) the state of the
voting machine, so we use the stdout
file descriptor to send the result back
to the host.
This way, we can update the voting machine state at each iteration while
preserving its privacy.
Finally, in the freeze
function, the
same patterns of sending and receiving data are repeated.
Conclusions
In this guide, we've covered the basics of I/O in the zkVM. We've seen how to
send data from the host to the guest and vice-versa, how private and public
data are distinguished, and how to commit data to the journal.
We also covered the trade-offs between using the standard functions and their
_slice
variants and showed through the Voting Machine
example how to implement I/O in practice.
There are more examples available in the examples page that you can use as
reference if you wish.
Happy coding!