Rust Ownership: Understanding Pinning and Self-Referential Structs

One of the most abrupt complexity spikes in Rust occurs when you attempt to define a struct where one field refers to another field within the same struct. This is the "Self-Referential Struct" problem. It is a mandatory hurdle for developers building custom async executors, zero-copy parsers, or implementing intrusive linked lists.

The compiler will reject any naive attempt to create this structure with lifetime errors, usually citing that a value is being moved while borrowed. To solve this, we must drop down to std::pin and raw pointers, manually enforcing the invariants that the borrow checker cannot.

The Root Cause: Moves and Memory Addresses

To understand why Rust forbids self-references by default, you must understand the mechanical implication of a "Move."

In Rust, types are Unpin by default. This means the compiler is free to memcpy the bits of your struct from one stack slot to another, or from the stack to the heap, at virtually any time (e.g., returning from a function, passing by value, or resizing a Vec).

Consider a struct Parser living at memory address 0x1000.

It has a buffer: String field located at offset 0 (0x1000).
It has a cursor: &String field intended to point to buffer.
If initialized, cursor holds the address 0x1000.

If we move Parser to a new scope, it might be copied to address 0x2000.

buffer is now at 0x2000.
However, the cursor field was copied byte-for-byte. It still contains 0x1000.
0x1000 is now invalid memory (dangling pointer).

Safe Rust prohibits this pattern because it cannot verify that the struct will never move after the reference is created.

The Solution: Pinning and PhantomPinned

To safely implement a self-referential struct, we must create a type-level contract guaranteeing that the object will not be moved in memory after it is created.

We achieve this using:

std::marker::PhantomPinned: A marker that opts the struct out of the Unpin trait.
std::pin::Pin: A wrapper type that encapsulates the pointer to our data, preventing safe access to &mut T (which is required to swap or move the data).
std::ptr::NonNull: Raw pointers to bypass the borrow checker's lifetime tracking for the internal reference.

The Implementation

Here is a complete, compilable implementation of a self-referential struct. This example mimics a parser that holds its own text buffer and a "cursor" pointing to a slice of that text.

use std::marker::PhantomPinned;
use std::pin::Pin;
use std::ptr::NonNull;
use std::str;

// 1. The Struct Definition
struct SelfRefParser {
    // The owner of the data
    buffer: String,
    // The "self-reference". We use NonNull (a raw pointer wrapper) 
    // because we cannot use a standard lifetime-bound reference here.
    slice_ptr: Option<NonNull<str>>,
    // This marker makes the struct !Unpin.
    _pin: PhantomPinned,
}

impl SelfRefParser {
    /// Create a new, pinned instance on the heap.
    /// We return Pin<Box<Self>> because the struct must be heap-allocated
    /// to ensure a stable address before we set the self-reference.
    pub fn new(content: String) -> Pin<Box<Self>> {
        let parser = SelfRefParser {
            buffer: content,
            slice_ptr: None, // Initialize as empty first
            _pin: PhantomPinned,
        };

        // Move into a Box immediately. This allocation creates the stable address.
        let mut boxed = Box::pin(parser);

        // 2. The Initialization Trick
        // We need a raw pointer to the buffer inside the now-pinned box.
        // We use 'unsafe' because we are dereferencing a raw pointer and
        // mutating a generic Pinned value.
        let self_ptr: *const str = &boxed.buffer;
        
        // We cannot mutate `boxed` directly because it is Pinned.
        // We must use `get_unchecked_mut` to bypass the protection 
        // strictly for initialization.
        // SAFETY: We know we are not moving the object, only writing to a field.
        unsafe {
            let mut_ref = Pin::get_unchecked_mut(boxed.as_mut());
            mut_ref.slice_ptr = Some(NonNull::from(self_ptr)); // Convert reference to NonNull
        }

        boxed
    }

    /// Example method accessing the self-reference
    pub fn get_slice(&self) -> &str {
        // We must reconstruct the reference from the raw pointer.
        // SAFETY: The contract of Pin guarantees `buffer` has not moved,
        // so `slice_ptr` is still valid.
        unsafe {
            self.slice_ptr
                .as_ref()
                .expect("Parser uninitialized")
                .as_ref()
        }
    }

    /// Example of modifying the internal state
    /// Note: We take `Pin<&mut Self>`, not `&mut Self`
    pub fn append_and_update(mut self: Pin<&mut Self>, suffix: &str) {
        // To modify fields, we need unsafe access to the mutable reference
        unsafe {
            let inner = Pin::get_unchecked_mut(self.as_mut());
            inner.buffer.push_str(suffix);
            
            // CRITICAL: Re-allocation of String might invalidate the old pointer!
            // We must update the pointer every time the buffer might move internally.
            let new_ptr: *const str = &inner.buffer;
            inner.slice_ptr = Some(NonNull::from(new_ptr));
        }
    }
}

fn main() {
    // 1. Creation
    let mut parser = SelfRefParser::new("Hello".to_string());
    
    println!("Initial: {}", parser.get_slice()); // Output: Hello

    // 2. Mutation
    // The compiler prevents moving `parser` out of the variable now.
    parser.as_mut().append_and_update(", World");
    
    println!("Updated: {}", parser.get_slice()); // Output: Hello, World
    
    // 3. Verification of Address Stability
    // If we tried to perform: `let p2 = *parser;` or `mem::swap`, 
    // the compiler would error because SelfRefParser implies !Unpin.
}

Why This Works

1. The `!Unpin` Marker

By adding PhantomPinned, SelfRefParser effectively implements !Unpin. This tells the compiler (and generic functions like mem::swap) that this type is not safe to move once it is pinned.

2. Heap Allocation (`Box::pin`)

We cannot create a self-reference on the stack easily because stack frames are volatile. By using Box::pin, we allocate the struct on the heap. Heap allocations have stable memory addresses for the lifetime of the allocation, regardless of how we pass the owning Box pointer around.

3. `unsafe` Initialization

The initialization phase is the critical moment.

We create the struct with None.
We pin it to the heap.
Only then do we take the address of buffer.

If we took the address of buffer before boxing, we would be taking the address of the stack variable. When we subsequently boxed it, the data would move to the heap, and our pointer would still point to the (now invalid) stack slot.

4. Updating Pointers

Notice the append_and_update function. Even though the SelfRefParser struct itself doesn't move (it's pinned), the String inside it manages its own heap buffer. If the string grows and reallocates its internal buffer, our slice_ptr becomes invalid.

When dealing with self-referential structs, you own the responsibility of validity. If the data being pointed to is dynamic (like a Vec or String), you must update your internal pointers whenever that data changes.

Conclusion

Self-referential structs are the mechanism behind Rust's async/.await state machines. When you write an async block, the compiler generates an anonymous struct similar to the one above, ensuring that variables borrowed across .await points remain valid.

For application-level code, avoid writing this manual boilerplate unless necessary. Prefer crates like ouroboros or self_cell which wrap this unsafe logic in a verified macro. However, for systems programming, understanding the relationship between Pin, !Unpin, and stable memory addresses is non-negotiable.

Programming Tutorials

Search This Blog