Skip to main content

OCaml 5 Migration: Porting Lwt Promises to Eio Effects

 The shift from Lwt (cooperative threading via monads) to Eio (direct-style parallelism via Effect Handlers) represents the most significant paradigm shift in the OCaml ecosystem in a decade. While the promise of "no more monads" is alluring, the migration path is fraught with invisible dangers.

The primary friction point is not syntax; it is the fundamental change in the execution model. In Lwt, context switches only occur at explicit bind points (>>= or let*). You implicitly relied on this behavior for atomicity. In OCaml 5 with Eio, code running on multiple domains introduces true parallelism. Consequently, logic that was thread-safe by accident in Lwt becomes a race condition in Eio.

This post details the rigorous migration of a stateful, asynchronous module from Lwt to Eio, ensuring structured concurrency and thread safety.

The Why: Monadic Cooperative vs. Direct Structured Concurrency

To port correctly, you must understand the mechanical divergence.

Lwt (Legacy):

  • Heap-allocated Promises: Lwt.t values represent future computations.
  • Callback Chain: The scheduler executes a callback when a promise resolves.
  • Implicit Global Scope: Lwt.async allows fire-and-forget tasks that can outlive their creator, leading to resource leaks.
  • Cooperative Scheduling: On a single core, code between binds is atomic.

Eio (Modern):

  • Stack-based Fibers: Suspending happens via Effect Handlers, preserving the stack. Code looks synchronous.
  • Structured Concurrency: All fibers must be scoped to a Switch. A parent cannot return until all children terminate.
  • Preemptive-capable: While Eio fibers are cooperative within a domain, domains run in parallel. Shared state requires genuine synchronization primitives (Eio.MutexEio.Semaphore).

The error most teams make is stripping the let* syntax without introducing the necessary Eio.Switch scopes or Eio.Mutex locks, resulting in "stuck" fibers or memory corruption.

The Fix: A Step-by-Step Migration

We will refactor a "Throttled Cache" service. This common pattern involves checking a hash map, potentially fetching data (network IO), and updating the state, all while ensuring we don't fetch the same key twice concurrently.

1. The Legacy Code (Lwt)

In Lwt, we used Lwt_mutex to protect the critical section. Note the monadic bind syntax.

(* legacy_cache.ml *)
open Lwt.Infix

type 'a entry =
  | Pending of 'a Lwt.t
  | Cached of 'a

type 'a t = {
  cache : (string, 'a entry) Hashtbl.t;
  lock : Lwt_mutex.t;
}

let create () = {
  cache = Hashtbl.create 100;
  lock = Lwt_mutex.create ();
}

(* Mock database fetch *)
let fetch_from_db key =
  Lwt_unix.sleep 0.1 >>= fun () ->
  Lwt.return (Printf.sprintf "Value_for_%s" key)

let get_or_fetch t key =
  Lwt_mutex.with_lock t.lock (fun () ->
    match Hashtbl.find_opt t.cache key with
    | Some (Cached v) -> Lwt.return v
    | Some (Pending p) -> p (* Join existing fetch *)
    | None ->
        (* No entry: create a pending promise *)
        let p, resolver = Lwt.task () in
        Hashtbl.add t.cache key (Pending p);
        
        (* Fork the fetch operation *)
        Lwt.async (fun () ->
          fetch_from_db key >>= fun result ->
          (* Re-acquire lock to update state? 
             Lwt is cooperative, so this simple assignment 
             might be safe depending on yield points, 
             but it's brittle. *)
          Hashtbl.replace t.cache key (Cached result);
          Lwt.wakeup resolver result;
          Lwt.return_unit
        );
        p
  )

2. The Migration Strategy: Hybrid Bridge

Rewriting the entire application atomically is rarely feasible. We use Lwt_eio to bridge the gap. This allows us to run the Lwt event loop inside Eio, permitting gradual refactoring.

However, we will skip straight to the Pure Eio implementation to demonstrate the architectural changes required for correctness.

3. The Modern Solution (Eio)

In Eio, we replace Lwt_mutex with Eio.Mutex. Crucially, we replace Lwt.task (promises) with Eio.Promise. We also utilize Eio.Switch to manage the lifetime of the background fetchers, ensuring no orphaned fibers.

(* modern_cache.ml *)
open Eio.Std

type 'a entry =
  | Pending of 'a Promise.t
  | Cached of 'a

type 'a t = {
  cache : (string, 'a entry) Hashtbl.t;
  (* Eio.Mutex is safe for Domains, unlike Lwt_mutex *)
  lock : Eio.Mutex.t; 
}

let create () = {
  cache = Hashtbl.create 100;
  lock = Eio.Mutex.create ();
}

(* Direct style: no monads. Looks blocking, but yields effect. *)
let fetch_from_db ~clock key =
  Eio.Time.sleep clock 0.1;
  Printf.sprintf "Value_for_%s" key

let get_or_fetch ~sw ~clock t key =
  (* Critical section 1: Check/Reserve *)
  Eio.Mutex.use_rw t.lock (fun () ->
    match Hashtbl.find_opt t.cache key with
    | Some (Cached v) -> v
    | Some (Pending p) -> Promise.await p (* Block fiber until resolved *)
    | None ->
        (* Create a promise and a resolver *)
        let p, resolver = Promise.create () in
        Hashtbl.add t.cache key (Pending p);
        
        (* Spawn a new fiber within the provided switch.
           This replaces Lwt.async. *)
        Fiber.fork ~sw (fun () ->
          let result = fetch_from_db ~clock key in
          
          (* Critical section 2: Update State *)
          Eio.Mutex.use_rw t.lock (fun () ->
            Hashtbl.replace t.cache key (Cached result)
          );
          
          (* Resolve the promise, waking up waiting fibers *)
          Promise.resolve resolver result
        );
        
        (* Current fiber waits for the result *)
        Promise.await p
  )

(* Usage Entry Point *)
let main () =
  Eio_main.run @@ fun env ->
  let clock = Eio.Stdenv.clock env in
  let cache = create () in
  
  (* Switch creates a scope. get_or_fetch cannot leak fibers 
     outside this block. *)
  Switch.run @@ fun sw ->
    Fiber.both
      (fun () -> 
         let v = get_or_fetch ~sw ~clock cache "UserA" in
         traceln "Fiber 1 got: %s" v)
      (fun () -> 
         let v = get_or_fetch ~sw ~clock cache "UserA" in
         traceln "Fiber 2 got: %s" v)

The Explanation

1. Resource Bounding with Switch

In the Lwt example, Lwt.async detached the fetch operation from the caller's control flow. If the main program exited, that async operation might be cut off abruptly or leak.

In Eio, get_or_fetch accepts a named argument ~sw. We use Fiber.fork ~sw. This guarantees that the Switch.run block in main cannot exit until the fetch fiber completes. This is structured concurrency: control flow matches lexical scope.

2. Synchronization with Eio.Mutex

In Lwt, we often got away with reading/writing mutable variables without locks if there were no >>= in between. In OCaml 5, if fetch_from_db were running on a different Domain (via Eio.Domain_manager), accessing the Hashtbl without a lock would be a race condition leading to segfaults or corrupted data. Eio.Mutex protects the critical sections regardless of whether the fibers are on the same domain or distinct domains.

3. The Promise Mechanism

Lwt promises are "hot" (start immediately). Eio promises are explicit coordination points.

  • Promise.create () returns a promise and a resolver.
  • Promise.await p suspends the current fiber (via effects) until Promise.resolve is called.
  • Unlike Lwt, Promise.await does not infect the function signature. The function remains direct style string -> string.

Conclusion

Porting to Eio requires unlearning the "bind chain" muscle memory and adopting a resource-oriented mindset.

  1. Stop returning promises. Return values directly.
  2. Pass Switch explicitly. Do not create background fibers implicitly; make the lifecycle ownership clear in the API.
  3. Lock aggressively. Assume parallelism is active. Use Eio.Mutex around all shared mutable state, specifically standard library structures like Hashtbl which are not thread-safe.

By following these patterns, you align with OCaml 5's multicore runtime capabilities while eliminating the callback hell of the past.