Skip to main content

Rust vs. Go in 2025: When to Trade Developer Velocity for Raw Performance

 

The Optimization Wall

In the lifecycle of every high-scale backend system, you hit the "Go Ceiling."

You chose Go for its developer velocity, CSP-style concurrency, and the "boring" simplicity that allows new hires to ship code on day one. It worked perfectly—until you hit 500k concurrent WebSocket connections or started ingesting telemetry data at 50GB/hour.

Now, your P99 latency is erratic. You are throwing hardware at the problem, but CPU utilization spikes unpredictably. Profiling reveals the culprit isn't your business logic; it's the Go Runtime. Specifically, the Garbage Collector (GC) trying to manage a massive heap of short-lived objects.

You are faced with a binary choice: rewrite the entire service in Rust (stalling product development for months) or accept the latency/cost inefficiency.

There is a third option: Hybrid Architecture. By offloading specifically the high-churn memory operations to Rust via Foreign Function Interface (FFI), you can keep Go’s orchestration capabilities while bypassing its GC limitations.

The Root Cause: Generational Hypotheses and GC Pressure

To fix the problem, we must understand the mechanics of the Go runtime vs. Rust's ownership model.

Go's GC Limitations: Go uses a non-compacting, concurrent, mark-sweep collector. While the "Stop The World" pauses are now sub-millisecond, the Mark Assist phase is not free.

  1. Write Barriers: Every pointer write in Go incurs a small overhead to maintain memory consistency for the GC.
  2. Scan Rate: The more pointers you have on the heap, the more work the GC has to do. If you are building an in-memory aggregator or a high-frequency trading engine, you are likely creating millions of short-lived structs.
  3. CPU Theft: When allocation pressure exceeds the GC's pacing, the runtime forces your application goroutines to help with garbage collection (Mark Assist). This steals CPU cycles directly from your request handlers, causing P99 spikes.

Rust's Advantage: Rust has no GC. It uses affine types (move semantics) and the borrow checker to determine memory cleanup at compile time.

  1. Deterministic Destruction: Memory is freed exactly when it goes out of scope.
  2. No Scans: There is no background process walking your memory graph.
  3. Custom Allocators: You can easily swap to jemalloc or mimalloc for specific fragmentation profiles.

The Solution: The "Go-Rust Bridge"

We will implement a hybrid solution for a High-Frequency Windowed Aggregator.

  • Go handles the network layer (HTTP/WebSocket), auth, and orchestration (its strong suit).
  • Rust handles the stateful, high-memory-churn aggregation logic.
  • CGO serves as the bridge.

Step 1: The Rust Core (The Engine)

We create a Rust library that exposes a C-compatible ABI. We will use a Box to allocate memory on the Rust heap, which is invisible to the Go GC.

Cargo.toml

[package]
name = "aggregator_core"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["cdylib"] # Important: Compile to .so/.dll/.dylib

[dependencies]
libc = "0.2"

src/lib.rs

use std::slice;
use std::ffi::{c_double, c_longlong};

// The heavy state that we want to hide from Go's GC
pub struct WindowAggregator {
    window_sum: f64,
    count: usize,
    samples: Vec<f64>, // Dynamic allocation that would stress Go GC
}

impl WindowAggregator {
    fn new(capacity: usize) -> Self {
        Self {
            window_sum: 0.0,
            count: 0,
            samples: Vec::with_capacity(capacity),
        }
    }

    fn add(&mut self, val: f64) {
        self.samples.push(val);
        self.window_sum += val;
        self.count += 1;
    }

    fn average(&self) -> f64 {
        if self.count == 0 {
            return 0.0;
        }
        self.window_sum / self.count as f64
    }
}

// --- FFI EXPORTS ---

// Create a new instance and return a raw pointer
#[no_mangle]
pub extern "C" fn aggregator_new(capacity: c_longlong) -> *mut WindowAggregator {
    let aggregator = WindowAggregator::new(capacity as usize);
    Box::into_raw(Box::new(aggregator))
}

// Free the memory (Rust ownership reclaimed here)
#[no_mangle]
pub unsafe extern "C" fn aggregator_free(ptr: *mut WindowAggregator) {
    if ptr.is_null() { return; }
    let _ = Box::from_raw(ptr);
}

// Process a batch of data (Batching reduces CGO overhead)
#[no_mangle]
pub unsafe extern "C" fn aggregator_process_batch(
    ptr: *mut WindowAggregator,
    data_ptr: *const c_double,
    len: usize
) -> c_double {
    let aggregator = &mut *ptr;
    let slice = slice::from_raw_parts(data_ptr, len);
    
    for &val in slice {
        aggregator.add(val);
    }
    
    aggregator.average()
}

Compile this with cargo build --release. This yields libaggregator_core.so (Linux), .dylib (macOS), or .dll (Windows).

Step 2: The Go Orchestrator

In Go, we treat the Rust pointer as an opaque unsafe.Pointer. We must ensure we pass data in batches to amortize the CGO call overhead (~150ns per call).

main.go

package main

/*
#cgo LDFLAGS: -L./target/release -laggregator_core
#include <stdlib.h>

// Define C types mapping
typedef void* AggregatorPtr;

// Forward declarations of Rust functions
AggregatorPtr aggregator_new(long long capacity);
void aggregator_free(AggregatorPtr ptr);
double aggregator_process_batch(AggregatorPtr ptr, double* data, size_t len);
*/
import "C"

import (
    "fmt"
    "math/rand"
    "runtime"
    "unsafe"
)

// AggregatorWrapper makes the Unsafe Rust calls safe for Go consumers
type AggregatorWrapper struct {
    ptr C.AggregatorPtr
}

func NewAggregator(capacity int64) *AggregatorWrapper {
    return &AggregatorWrapper{
        ptr: C.aggregator_new(C.longlong(capacity)),
    }
}

func (a *AggregatorWrapper) Free() {
    C.aggregator_free(a.ptr)
    a.ptr = nil
}

// ProcessBatch sends a slice of Go float64s to Rust without copying the underlying array
func (a *AggregatorWrapper) ProcessBatch(data []float64) float64 {
    if len(data) == 0 {
        return 0.0
    }
    
    // Pinned memory: We pass the pointer to the first element of the slice
    // Go's GC knows not to move this slice during the C call
    cPtr := (*C.double)(unsafe.Pointer(&data[0]))
    cLen := C.size_t(len(data))

    return float64(C.aggregator_process_batch(a.ptr, cPtr, cLen))
}

func main() {
    // 1. Setup Phase
    // Large capacity to demonstrate Rust handling large heap allocations
    agg := NewAggregator(1_000_000)
    
    // Ensure cleanup happens, though in a long-running server this binds to app lifecycle
    defer agg.Free() 

    fmt.Println("Rust Aggregator Initialized.")

    // 2. Simulation of High-Throughput Ingestion
    // Create a batch of telemetry data
    batchSize := 500_000
    data := make([]float64, batchSize)
    for i := 0; i < batchSize; i++ {
        data[i] = rand.Float64() * 100
    }

    // 3. Execution
    // Print runtime stats before
    var m runtime.MemStats
    runtime.ReadMemStats(&m)
    fmt.Printf("Go Heap Alloc Before: %v MB\n", m.HeapAlloc/1024/1024)

    // Send to Rust (Computation happens there, memory stored there)
    avg := agg.ProcessBatch(data)

    // 4. Verification
    runtime.ReadMemStats(&m)
    fmt.Printf("Average: %.4f\n", avg)
    
    // Notice: The massive 'samples' vector inside Rust does NOT appear in Go's HeapAlloc
    fmt.Printf("Go Heap Alloc After:  %v MB\n", m.HeapAlloc/1024/1024) 
}

Step 3: Build and Run

To run this, you must link the dynamic library.

# Build Rust release
cargo build --release

# Run Go (assumes Linux/macOS)
# LD_LIBRARY_PATH tells the linker where to find our new Rust .so/.dylib
export CGO_LDFLAGS="-L$(pwd)/target/release -laggregator_core"
export LD_LIBRARY_PATH="$(pwd)/target/release"

go run main.go

Why This Works

1. Zero GC Pressure for State

In the Go code above, the data slice is transient. Once ProcessBatch returns, Go can reclaim that memory immediately. However, the state (the accumulating history of samples) lives inside the WindowAggregator struct in Rust. Go's Garbage Collector does not see the Vec<f64> inside the Rust struct. It only sees a single pointer (unsafe.Pointer). You can store 10GB of history in Rust, and Go's GC scan time will remain constant (essentially zero).

2. Batching for Performance

Notice the signature process_batch. Calling CGO has overhead. If you called add(val) one million times inside a Go loop, the context switching overhead (stack swapping, scheduler interaction) would destroy performance. By passing a pointer to a slice (&data[0]), we utilize Zero-Copy sharing. Rust reads the memory directly from Go's memory space. We process 500,000 items in a single FFI hop.

3. Safety Enforcement

Rust guarantees that the Vec handling is memory-safe. Go guarantees that the network handling is concurrency-safe. We use defer agg.Free() to ensure we don't leak Rust memory when the Go wrapper is done (though in a persistent backend service, this often lives for the process duration).

Conclusion

In 2025, the choice isn't "Rust OR Go." It is about placing the complexity where it belongs.

Use Go for the 90% of your codebase that requires high velocity, readability, and standard I/O orchestration. Use Rust for the 10% of the hot path where object churn creates GC pressure that hardware cannot solve.

By using the hybrid CGO approach with batched processing, you effectively "opt-out" of the Go GC for your heaviest datasets, gaining raw systems performance without sacrificing the productivity of your wider team.

Popular posts from this blog

Restricting Jetpack Compose TextField to Numeric Input Only

Jetpack Compose has revolutionized Android development with its declarative approach, enabling developers to build modern, responsive UIs more efficiently. Among the many components provided by Compose, TextField is a critical building block for user input. However, ensuring that a TextField accepts only numeric input can pose challenges, especially when considering edge cases like empty fields, invalid characters, or localization nuances. In this blog post, we'll explore how to restrict a Jetpack Compose TextField to numeric input only, discussing both basic and advanced implementations. Why Restricting Input Matters Restricting user input to numeric values is a common requirement in apps dealing with forms, payment entries, age verifications, or any data where only numbers are valid. Properly validating input at the UI level enhances user experience, reduces backend validation overhead, and minimizes errors during data processing. Compose provides the flexibility to implement ...

jetpack compose - TextField remove underline

Compose TextField Remove Underline The TextField is the text input widget of android jetpack compose library. TextField is an equivalent widget of the android view system’s EditText widget. TextField is used to enter and modify text. The following jetpack compose tutorial will demonstrate to us how we can remove (actually hide) the underline from a TextField widget in an android application. We have to apply a simple trick to remove (hide) the underline from the TextField. The TextField constructor’s ‘colors’ argument allows us to set or change colors for TextField’s various components such as text color, cursor color, label color, error color, background color, focused and unfocused indicator color, etc. Jetpack developers can pass a TextFieldDefaults.textFieldColors() function with arguments value for the TextField ‘colors’ argument. There are many arguments for this ‘TextFieldDefaults.textFieldColors()’function such as textColor, disabledTextColor, backgroundColor, cursorC...

jetpack compose - Image clickable

Compose Image Clickable The Image widget allows android developers to display an image object to the app user interface using the jetpack compose library. Android app developers can show image objects to the Image widget from various sources such as painter resources, vector resources, bitmap, etc. Image is a very essential component of the jetpack compose library. Android app developers can change many properties of an Image widget by its modifiers such as size, shape, etc. We also can specify the Image object scaling algorithm, content description, etc. But how can we set a click event to an Image widget in a jetpack compose application? There is no built-in property/parameter/argument to set up an onClick event directly to the Image widget. This android application development tutorial will demonstrate to us how we can add a click event to the Image widget and make it clickable. Click event of a widget allow app users to execute a task such as showing a toast message by cli...