A port of xv6 to Rust, targeting RISC-V and running without the standard library. The original is written in C. My goal was not to transliterate it line by line, but to take the same ideas and express them in a way that is more modern, safer, and takes advantage of what Rust has to offer.
- Implemented spin-locks and sleep-locks with RAII guards that enforce interrupt safety and automatic release through the type system.
- Built Sv39 page tables with a custom walker and distinct virtual and physical address types to catch address space mix-ups at compile time.
- Developed a full process lifecycle including fork, exec, context switching, and a round-robin scheduler running across multiple hardware threads.
- Implemented a crash-safe filesystem with write-ahead logging, a VirtIO disk driver, and a POSIX-style syscall interface.
- Wrote a user-space shell supporting pipes, I/O redirections, and background jobs.
xv6 is cool
xv6 is a small, clean Unix-like kernel written for teaching purposes. What makes it worth studying is that every subsystem is stripped down to its essential ideas with no distractions. Here are a few of the parts I found most interesting.
trampoline
When a user-space program wants to access shared resources, it must go through the syscall process. Getting this path right is a non-trivial job and xv6 handles it in a clever way.
Let's follow a call of uptime() throughout the execution. It is important to keep in mind that the CPU fetches instructions to execute from the active page table. I say active because there are two sets of page tables on this system: one for the kernel (Kvm) and one for each user-space process (Uvm).
// in uptime.rs (user-space)
fn main(_args: Args) {
println!("{}", uptime());
}Using a system call from user-space triggers an interrupt. The CPU is configured to redirect the execution to where the trap handler (uservec) is located in memory. This is where we encounter the big issue with this flow: for the kernel to start executing its own code, it needs to access its own memory which requires us to switch from user process' Uvm to kernel's Kvm.
xv6 achieves this using a trampoline page. This special page of executable code is present on both the kernel's and every user process' page table. More importantly, this page is always mapped to the same address across all tables. This way the CPU can start executing at this page and when the switch happens, the next instruction will still be present at the next address.
// in trampoline.rs (both)
pub extern "C" fn uservec() { ... }uservec saves all the registers in a specific page called trapframe (mapped to the same address on every page table). These registers will be restored when we are returning back to the user-space. But for now, with everything safely stored away, uservec switches to the Kvm and then, calls the usertrap function. This function decodes the cause of the interrupt (in this case a syscall) and forwards it to the correct part of the kernel.
// in trap.rs (kernel)
pub unsafe extern "C" fn usertrap() { ... }Finally (skipping over some steps with dispatching), we can run the actual logic for the uptime and start our journey back to the user-space with a return number as the result.
// in sysproc.rs (kernel)
pub fn sys_uptime(_args: &SyscallArgs) -> Result<usize, SysError> { ... }Once the syscall dispatcher returns back to usertrap, execution is handed off to usertrapret, the counterpart in the return path. This function configures everything needed to handle a new interrupt from the user-space and for the user process to continue executing.
// in trap.rs (kernel)
pub unsafe extern "C" fn usertrapret() { ... }usertrapret also figures out which Uvm belongs to the process we are returning to, and calculates the virtual address of the trampoline page in that page table. With all this information, it jumps to the userret function in the kernel's trampoline. Just like its counterpart uservec, being located in the trampoline page, this function can start executing in one domain and switch to the other without an issue. Which is exactly what happens: the registers saved in the trapframe get restored and the page table switches to the previously calculated one.
// in trampoline.rs (both)
pub extern "C" fn userret(page_table: usize) { ... }As you can see, this is not a trivial sequence: a single syscall crosses the user/kernel boundary twice, switches page tables twice, and passes through at least four distinct functions before the result is back in user space.
context switching
A multi-process kernel needs to share a single CPU across many processes. When the scheduler decides it is time to run a different process, it needs to save the current CPU state and restore the state of the next one. This is called a context switch, and it is done in swtch.
#[unsafe(naked)]
pub unsafe extern "C" fn swtch(old: &mut Context, new: &Context) {
naked_asm!(
"sd ra, 0(a0)", // save return address
"sd sp, 8(a0)", // save stack pointer
"sd s0, 16(a0)", // save s0-s11 (callee-saved)
...
"ld ra, 0(a1)", // load new return address
"ld sp, 8(a1)", // load new stack pointer
"ld s0, 16(a1)", // load new s0-s11
...
"ret",
);
}Notice that swtch only saves 14 registers: ra, sp, and s0–s11. These are the callee-saved registers, the ones the calling convention says a function must preserve. The caller-saved registers (a0–a7, t0–t6) are already the caller's responsibility to save before making any function call, including this one. So we get those for free, cutting the work roughly in half.
You might also notice the #[unsafe(naked)] attribute. This tells the compiler to emit no prologue or epilogue around our assembly. Any compiler-generated setup code would modify the very registers we are trying to save, so we need the raw assembly and nothing else.
The last oddity here is ret. Normally this instruction returns to the caller, but here it returns to whatever address is in the new context's return address (ra). That address is the instruction after swtch was called in the other process. The function appears to return normally, but we are now running on a completely different stack in a completely different process.
sleep & wakeup
When waiting for I/O, busy-looping wastes CPU time on an operation that could take an indeterminate amount of time to complete. Instead, we want to yield execution to another process and return as soon as the resource becomes available. This is the sleep/wakeup mechanism.
A sleeping process specifies a Channel, identifying what condition it is waiting on. wakeup() scans all processes sleeping on that channel and marks them as runnable. The scheduler then picks them up on its next pass.
pub enum Channel {
Ticks,
PipeRead(usize),
PipeWrite(usize),
Lock(usize),
...
}Pipes used in inter-process communication (IPC) are a great example of this. When a reader finds the pipe buffer empty, it sleeps on Channel::PipeRead. When a writer finishes writing, it calls wakeup(Channel::PipeRead) to wake any waiting readers, and vice versa when the buffer is full.
// in pipe::read(), sleep if buffer is empty
let mut inner = self.inner.lock();
while inner.num_read == inner.num_write && inner.write_open {
inner = proc::sleep(Channel::PipeRead(self.pipe_id()), inner);
}
// in pipe::write(), after writing, wake the reader
proc::wakeup(Channel::PipeRead(self.pipe_id()));You might be wondering why the mutex guard inner is being passed to the sleep function and re-assigned to its return value... Well, let me introduce you to a different kind of race condition: lost-wakeup problem.
Consider a reader that checks the buffer, finds it empty, and is about to sleep but the writer runs first, writes data, and calls wakeup() before the reader has gone to sleep (gasp). The reader then sleeps forever waiting for a wakeup that already happened.
The fix is that sleep() takes ownership of the SpinLockGuard protecting the condition (the pipe's inner state), holds the process lock, and only then releases the condition lock. Since wakeup() must also acquire the process lock to mark it runnable, it cannot slip in between the check and the sleep.
pub fn sleep<T>(channel: Channel, condition_lock: SpinLockGuard<'_, T>) -> SpinLockGuard<'_, T> {
let proc = current_proc();
let mut inner = proc.inner.lock(); // acquire proc lock first
let condition_mutex = SpinLock::unlock(condition_lock); // then release condition
inner.channel = Some(channel);
inner.state = ProcState::Sleeping;
// ...context switch to scheduler...
condition_mutex.lock() // reacquire condition before returning
}The function signature enforces this protocol at compile time: you cannot call sleep() without handing over the condition guard, and you get it back when you wake up, ready to re-check the condition. This pattern is not specific to pipes; it appears throughout the kernel.
how can you improve upon this?
Translating xv6 to Rust is not just a matter of swapping syntax. Once you are working in a language with a rich type system and ownership semantics, you start to notice places where the original C code relies on convention and discipline to be correct. Rust lets you encode those conventions into the types themselves, turning runtime bugs into compile-time errors. The following sections highlight a few places where the port comes out meaningfully safer or more expressive than the original.
ergonomic locks
xv6 uses Inodes to keep track of the number of references that are pointing to each resource like a file. The dup() method is used to duplicate the reference to an Inode, which trivially increments the reference counter by one. But xv6 is a multi-threaded kernel, we have to ensure this number is changed without a race condition. We lock a mutex on the global Inode structure, increment the reference counter, then release it. The function ends by returning a copy of the pointer.
struct inode* idup(struct inode *ip) {
acquire(&itable.lock);
ip->ref++;
release(&itable.lock);
return ip;
}Now let's look at the same function in octopos:
impl Inode {
pub fn dup(&self) -> Self {
let mut meta = INODE_TABLE.meta.lock();
meta[self.id].r#ref += 1;
self.clone()
} // dropped meta
}Like its counterpart, this function accepts a shared reference to the existing Inode structure but it does not edit it directly since the reference it gets is immutable. The ref count actually lives inside a global struct called INODE_TABLE and the Inode is a thin wrapper around an index to this table. This is the same global variable that xv6 acquired its lock from, but unlike xv6, the guarded field ref is actually guarded in octopos. Rust's type system enforces that no one can modify this field without first acquiring the lock.
pub struct InodeTable {
meta: SpinLock<[InodeMeta; NINODE]>
...
}Another neat thing with octopos is the releasing of the lock. There is no explicit release() method on the lock. Instead, this action is tied to the lifecycle of the guard variable. When the user calls lock(), they receive a LockGuard that, while holding, gives them exclusive access to the inner data. When this guard goes out of scope (a.k.a. dropped), it automatically unlocks the lock. For the curious, this is also how the Mutex works in the standard library.
impl<'a, T: 'a> Drop for SpinLockGuard<'a, T> {
fn drop(&mut self) {
...
self.lock.cpu.store(ptr::null_mut(), Ordering::Release);
}
}Now, comparing both methods, we have done more setup around the logistics of the lock, but in return, we achieved so much more:
| xv6 | octopos | |
|---|---|---|
| data access | ip->ref can be accessed without a lock; the developer must know to acquire it | meta[id].ref lives inside the lock; the type system guarantees exclusive access |
| releasing | explicit release() call; easy to forget | the guard releases the lock when it goes out of scope; scopes or drop() give back manual control |
This scope-based cleanup extends further than just protecting data inside the lock. Spin-locks also need to keep interrupts disabled while they are held. "Why?" you ask? Processes are interrupted many times during execution, for many different reasons. Generally, this is fine, the interrupt handler will take over execution on the same CPU, do its job, and yield back to the original process.
But what if the handler needs to hold the same lock to do its job? Since the lock is already held by the process, the interrupt handler cannot make any progress. So the handler will never return back to the process and the process cannot release the lock... We got ourselves a deadlock.
The solution is to disable interrupts before acquiring the lock and re-enable them only after releasing it (making the code in between a critical section). In octopos, SpinLockGuard holds an InterruptLock as a private field, tying interrupt state directly to lock ownership.
pub struct SpinLockGuard<'a, T: 'a> {
lock: &'a SpinLock<T>,
_intr_lock: InterruptLock, // interrupts disabled for the duration
}InterruptLock is a zero-sized type that re-enables interrupts when dropped. The guard acquires this when it is created and releases when it itself is released. Interrupts stay disabled for exactly as long as the lock is held. No manual enable()/disable() calls and no way to accidentally re-enable them early.
pub struct InterruptLock;
impl<T> SpinLock<T> {
pub fn lock(&self) -> SpinLockGuard<'_, T> {
let intr_lock = proc::lock_current_cpu();
...
}
}
impl Drop for InterruptLock {
fn drop(&mut self) {
unsafe { current_cpu().unlock() }
}
}not all numbers are created equal
One common mistake when writing kernel code is to confuse a physical address with a virtual one or vice-versa. They are both represented as a 64-bit unsigned integer and the compiler could not care less about the semantic difference between them. Look at the copyin function used to move data from the user-space to the kernel memory:
// Copy from user to kernel.
// Copy len bytes to dst from virtual address srcva in a given page table.
// Return 0 on success, -1 on error.
int copyin(pagetable_t pagetable, char *dst, uint64 srcva, uint64 len) { ... }The va in the srcva argument here indicates that the value should be a virtual address but there is nothing to enforce this invariant. It is easy enough to wrap the uint64 (or u64 in Rust) in a struct and give it a new name to add type safety.
#[repr(transparent)]
pub struct PA(u64);
#[repr(transparent)]
pub struct VA(u64);These types are marked as transparent so they are represented in memory exactly how the inner integer would be represented. But the added layer of abstraction also adds friction to other parts of the code. Even the simplest math operation on this new type now doesn't compile.
error[E0369]: cannot add `{integer}` to `PA`
| num + 1
| --- ^ - {integer}
| |
| PAThankfully, Rust lets us teach these types the operations they need by implementing traits. With some macro magic to avoid repetition, we can re-introduce all the required operations in just a few lines of code.
impl_ops!(PA, Add, add, AddAssign, add_assign);
impl_ops!(PA, Sub, sub, SubAssign, sub_assign);
impl_ops!(PA, Rem, rem, RemAssign, rem_assign);
impl_ops!(PA, BitAnd, bitand, BitAndAssign, bitand_assign);
impl_ops!(PA, BitOr, bitor, BitOrAssign, bitor_assign);
impl_ops!(PA, BitXor, bitxor, BitXorAssign, bitxor_assign);
impl_cmp!(PA);With these parts figured out, we can re-write the function to only accept a VA and the compiler will make sure that we are always abiding by that rule.
/// User Page Table
pub struct Uvm(pub PageTable);
impl Uvm {
/// Copy bytes from `src` virtual address in the current pagetable to `dst`.
pub fn copy_from(&mut self, src: VA, dst: &mut [u8]) -> Result<(), VmError> { ... }
}As a bonus, this function is an implementation under the Uvm struct, so it can only be called with a reference to that page table. This removes the necessity to pass an extra pointer to every function call.
rust is not always easy
Rust gives a lot, but it also asks a lot. Working without the standard library and close to the hardware surfaces some of the rougher edges of the language.
static initialization
The kernel's process table needs to be a global static variable. In xv6, this is not a problem: the table is left uninitialized and populated later as the kernel boots. Rust does not allow uninitialized statics. Everything must be initialized at compile time.
The standard library offers LazyLock for exactly this kind of deferred initialization, but that is a std-only type. No standard library, no LazyLock. I briefly implemented a crude version of it but ultimately decided the complexity wasn't worth it. So instead, I found a way to initialize the array at compile time.
Normally, you would use [value; N] expression to initialize a fixed-size array but the compiler requires the value inside the array to be Copy. In this case, Proc is wrapped with UnsafeCell for interior mutability and that very much does not implement Copy.
static TABLE: [u64; 64] = [0; 64];That leaves us with the only option: create this array manually in a const fn. Specifically, we need to be able to create an empty array with an exact size of the final result allocated in the stack, and assign each element one-by-one.
The way to achieve this is MaybeUninit. It is a wrapper type that tells the compiler to make no assumptions about the value's memory content since it might be uninitialized. We create an array of MaybeUninit<UnsafeCell<Proc>>, fill each slot in a while loop with the correct index, and then transmute the whole thing into the type we actually wanted.
pub const fn new() -> Self {
let mut table: [MaybeUninit<UnsafeCell<Proc>>; NPROC] =
unsafe { MaybeUninit::uninit().assume_init() };
let mut i = 0;
while i < NPROC {
table[i] = MaybeUninit::new(UnsafeCell::new(Proc::new(i)));
i += 1;
}
Self {
table: unsafe { transmute(table) },
...
}
}And we are not done yet. UnsafeCell is not Sync by default so the compiler will refuse to put it in a static. We have to manually tell the compiler that sharing across threads is safe, and take on the responsibility of upholding that ourselves.
unsafe impl Sync for ProcTable {}It is not pretty, but it is honest. Rust is not hiding the danger here. It is making you acknowledge it explicitly using all the unsafe keywords. The end result is the same fixed-size process table you would write in C, but you had to argue your case to the compiler every step of the way.
While researching for this write-up, I have realized that since I wrote the
MaybeUninit&transmutecode, Rust has improved this workflow drastically. Namely, theinline-constfeature got stabilized. This allows us to use the[const { non-copy-value }; N]expression to create an array by re-evaluating the initializer for the valueNtimes at compile time. We would still need aconst fnto adjust theidof each process, but this change would get rid of all theunsafeusage in that function.
the process struct
Another place Rust pushed back was the process struct itself. A process has two kinds of data: state that other CPUs may read (scheduling state, kill flag, sleep channel) and private data that only the running process ever touches (page table, trap frame, open files, context). In C, these all live in one struct and the developer keeps track of what needs a lock and what does not.
struct proc {
struct spinlock lock;
// p->lock must be held when using these:
enum procstate state; // Process state
int killed; // If non-zero, have been killed
int pid; // Process ID
...
// these are private to the process, so p->lock need not be held.
pagetable_t pagetable; // User page table
struct trapframe *trapframe; // data page for trampoline.S
struct context context; // swtch() here to run process
...
};In Rust, this distinction has to be made explicit. The shared state lives in a SpinLock<ProcInner>, but for the private data, a spin-lock is not the right tool. It would work, but acquiring a lock to access your own process data is unnecessary overhead and would complicate code paths like context switching that cannot hold locks.
The solution is UnsafeCell<ProcData>. It gives interior mutability without a lock, which is exactly what we need.
pub struct Proc {
pub id: usize,
pub inner: SpinLock<ProcInner>, // shared state, protected by lock
data: UnsafeCell<ProcData>, // private data, no lock needed
}The catch is that UnsafeCell provides no safety guarantees on its own. To get a mutable reference to the data, you call an unsafe method and assert that you know what you are doing. Besides the compiler, even clippy is trying to warn us against our crimes here and we gracefully disregard it.
/// # Safety
/// The caller must ensure they have exclusive access to the `Proc`. This is true if either
/// 1. it's the current proc (most cases) or
/// 2. the proc's state hasn't been set to Runnable/Sleeping yet (fork, allocproc).
#[allow(clippy::mut_from_ref)]
pub unsafe fn data_mut(&self) -> &mut ProcData {
unsafe { &mut *self.data.get() }
}The comment is doing the work the type system cannot. Rust has no way to express "only the currently running process may call this." That invariant exists, it is real, but it lives in documentation rather than types. This is one of those cases where the language cannot guard us, but at least unsafe signals the danger to every caller.
epilogue
Building octopos taught me more about how computers actually work than anything I had done before. Page tables, trap frames, and context switches were all concepts I had learned about, but implementing them from scratch forced a much deeper understanding. There is no better way to learn how something works than to break it repeatedly until it finally doesn't.
Rust was a genuine pleasure throughout. The type system caught entire classes of bugs before the code ever ran, and patterns like RAII made resource management feel almost effortless compared to what the C implementation had to juggle manually.
As long as I made the compiler happy, the code I wrote generally worked on the first try. However, reaching an agreement with the compiler is easier said than done, especially working in such low-level code. But I will take this kind of developer experience over debugging a segfault any day of the week.
There is no shortage of things left to explore, and I am sure I will be back.
use of ai
Both this post and the code are written by me, a human. I recently started trying out Claude in my workflow but I refuse to let it write code for me. Writing code is my passion and I do it because I enjoy it. Babysitting a computer to do that job for me sounds incredibly boring. However, I found LLMs useful when proof-reading my code, sometimes finding small bugs, and sometimes suggesting more idiomatic ways to code. Kind of like solo code-review.
See this project on git.