Rust Async Architecture: Box<dyn Future> vs impl Future for Public APIs

Designing a public async API in Rust forces you to make a difficult trade-off immediately: how do you return a Future?

For years, the ecosystem relied on the async-trait macro (dynamic dispatch). With the stabilization of async fn in traits (Rust 1.75+), library authors now have native options. However, this introduces a new complexity: choosing between the flexibility of dynamic dispatch (Box<dyn Future>) and the performance of static dispatch (impl Future).

This choice dictates your library's performance profile, thread-safety guarantees, and how painful it is for downstream users to integrate your code.

The Root Cause: Sizedness and State Machines

To make the right architectural decision, you must understand what happens inside the compiler when you write async fn.

Rust implements async/await using compiler-generated state machines. When you compile an async function, Rust creates a unique, anonymous enum that holds the function's state across .await points.

Because this state machine contains all local variables present at suspension points, its size varies wildly.

Unique Types: Every async function returns a distinct type.
Unknown Size: A trait method signature needs a concrete return size to determine the stack layout.

This creates the fundamental conflict. If you define a trait trait Service, the compiler needs to know what Service::call returns. But if impl A for Service and impl B for Service return different state machines of different sizes, the trait definition breaks.

You have two mechanisms to align these types: Erasure (Dynamic) or Inference (Static).

Approach 1: Dynamic Dispatch (Box<dyn Future>)

This is the "classic" approach, popularized by the async-trait crate. You force every implementation to return a heap-allocated pointer.

The Mechanism

Instead of returning the raw state machine, you wrap it in a Box. Since a Box (a pointer) has a fixed size (usually 8 bytes on 64-bit systems), the trait definition becomes valid.

use std::future::Future;
use std::pin::Pin;

// The "Old" Standard (roughly what async-trait expands to)
pub trait StorageDriver {
    fn store(&self, data: String) -> Pin<Box<dyn Future<Output = Result<(), String>> + Send + '_>>;
}

pub struct DiskDriver;

impl StorageDriver for DiskDriver {
    fn store(&self, data: String) -> Pin<Box<dyn Future<Output = Result<(), String>> + Send + '_>> {
        Box::pin(async move {
            // Async logic here
            Ok(())
        })
    }
}

The Trade-offs

Pros: It is object-safe. You can hold a Vec<Box<dyn StorageDriver>> and iterate over heterogeneous implementations. It compiles fast.
Cons: Allocation. Every function call triggers a malloc. This puts pressure on the global allocator and harms cache locality.
Cons: No Inlining. The compiler cannot see through the dyn pointer to optimize the inner state machine.

Approach 2: Static Dispatch (Native async fn)

As of Rust 1.75, you can write async fn directly in traits. This uses RPITIT (Return Position Impl Trait In Trait).

The Mechanism

The compiler effectively treats the return type as an associated type. It doesn't erase the type; it propagates the specific anonymous type of the state machine up the chain.

// Modern Rust (1.75+)
pub trait NetworkClient {
    // This desugars to returning -> impl Future<...>
    async fn fetch(&self, url: String) -> Result<String, String>;
}

pub struct HttpClient;

impl NetworkClient for HttpClient {
    async fn fetch(&self, url: String) -> Result<String, String> {
        // Zero allocation state machine
        Ok("payload".to_string())
    }
}

The Hidden Trap: Send Bounds

This approach is zero-cost, but it introduces a major concurrency hazard for library authors.

In the Box example, we explicitly wrote + Send. With native async fn, the returned Future is Send only if the implementation is Send.

If a downstream user tries to spawn a task using your library:

fn spawn_fetch<C: NetworkClient>(client: C) {
    tokio::spawn(async move {
        // ERROR: Future returned by client.fetch(...) cannot be sent between threads safely
        client.fetch("http://...".to_string()).await;
    });
}

The compiler will reject this because it cannot guarantee the hidden state machine returned by fetch is thread-safe.

The Solution: A Hybrid Public API

For a robust public API, you should default to Static Dispatch (native async traits) for maximum performance, but you must expose a mechanism to enforce Send bounds for multi-threaded users.

Here is the correct implementation pattern for modern Rust libraries.

Step 1: Define the Trait with Native Async

Define your core logic using standard async syntax. This allows zero-cost abstractions for users who don't need dynamic dispatch or thread safety (e.g., single-threaded embedded executors).

use std::future::Future;

pub trait DataProcessor {
    async fn process(&self, input: u32) -> u32;
}

Step 2: Enforce Send for Threaded Contexts

To allow users to use your library in Tokio or generic contexts, provide a blanket helper or distinct trait bound helper.

Currently, the most robust way to support Send requirements without forcing them on everyone is using a super-trait extension or trait_variant. However, manual bound specification works best for dependency minimalism.

Here is how you structure a function that accepts your trait in a multi-threaded context:

// The API Consumer function
// We strictly bound F (the Future) to be Send
pub fn run_in_background<D>(processor: D, input: u32)
where
    D: DataProcessor + Send + Sync + 'static,
    // CRITICAL: We must constrain the return type of the trait method
    for<'a> D::process_future<'a>: Send, 
{
    tokio::spawn(async move {
        processor.process(input).await;
    });
}

Note: The specific syntax process_future is currently experimental/unstable in direct naming. Until return_type_notation stabilizes, use the standard Send bound wrapper pattern below.

The Production-Ready Workaround (Trait Aliases)

Since explicit associated type bounds are verbose, the standard industry solution (used by heavy hitters like generic web frameworks) is to define a "Send-Safe" version of the trait automatically.

use std::future::Future;

// 1. The Core Trait (Performance focused)
pub trait Service {
    fn call(&self) -> impl Future<Output = ()> + Send; 
    // Note: Explicitly requiring Send here is often acceptable for 
    // server-side libraries, as non-Send futures are rare in backend dev.
}

// 2. The Implementation
struct MyService;

impl Service for MyService {
    // We use async fn here, but the signature matches the trait
    async fn call(&self) {
        // Logic
    }
}

If you need to support both Send and non-Send executors (e.g., embedded vs. cloud), use the trait_variant crate or a manually desugared approach.

Deep Dive: When to Box in 2024?

Should you ever use Box<dyn Future> anymore? Yes.

While static dispatch is faster, it prevents heterogeneous collections. If you are building a plugin architecture or middleware chain where you need a list of different services processed uniformly, static dispatch fails because every service has a different type.

The Type Erasure Pattern

If your library requires a generic Vec<Middleware>, you must provide a type-erased adapter.

// The static, fast trait
pub trait Handler {
    async fn handle(&self);
}

// The dynamic wrapper (Library Author provides this)
pub type BoxedHandler = Box<dyn Handler + Send + Sync>;

// You can't actually Box the trait directly if it uses `async fn`.
// You need an adapter struct or object-safe shim.
trait DynHandler {
    fn handle_dyn<'a>(&'a self) -> Box<dyn Future<Output = ()> + Send + 'a>;
}

// Blanket impl to make all static Handlers usable as DynHandlers
impl<T: Handler + Send + Sync> DynHandler for T {
    fn handle_dyn<'a>(&'a self) -> Box<dyn Future<Output = ()> + Send + 'a> {
        Box::new(self.handle())
    }
}

Summary

The "dilemma" is a choice between optimization and flexibility.

Library Core: Use Native async fn (Static Dispatch). It allows the compiler to inline code and optimize state machine sizes. It is zero-cost.
Concurrency: Be aware that async fn in traits captures lifetimes. If your library targets tokio::spawn, ensure your trait methods return impl Future + Send.
Application Layer: If you need a Vec<Service>, create a DynService wrapper that boxes the future. Do not force the Box on the core trait definition.

By defaulting to impl Future (via async fn), you give your users the raw performance they expect from Rust, while leaving the option open to Box specifically where dynamic dispatch is actually required.

Programming Tutorials

Search This Blog