Redis Cache Stampede Prevention: Implementing Probabilistic Early Expiration

The "3:00 AM" Spike

Your monitoring dashboard is all green. CPU usage is sitting at a comfortable 15%. Suddenly, without a corresponding increase in user traffic, your primary database CPU spikes to 100%, connections time out, and the API latency graph goes vertical. By the time you SSH in, the system has recovered.

You just fell victim to the Cache Stampede (also known as the Thundering Herd).

This isn't a traffic problem; it's a synchronization problem. A highly accessed cache key (e.g., a global configuration object or a trending leaderboard) expired. At that exact millisecond, 5,000 concurrent requests hit your backend. They all missed the cache simultaneously. They all decided to rebuild the expensive query simultaneously. Your database, unable to process 5,000 heavy aggregation queries in parallel, effectively DDoS'd itself.

The Root Cause: The Gap of Doom

The standard caching pattern is "Check Cache -> Miss -> Read DB -> Write Cache".

In a high-concurrency environment, the time delta between the Cache Miss and the Cache Write is non-zero. Let's call this Δ (computation time).

If your key expires at $T$, and 10,000 requests arrive between $T$ and $T + Δ$, every single one of those requests enters the database read phase.

Standard solutions usually involve:

Mutex Locking: Only allow one process to rebuild. (Drawback: Latency spikes for the blocked threads; complexity in distributed locking).
Background Refresh: A separate worker updates keys. (Drawback: Infrastructure overhead; managing decoupled state).

There is a mathematically superior approach that requires no locks and no sidecar processes: Probabilistic Early Expiration.

The Fix: Probabilistic Early Recomputation (PER)

Instead of waiting for the key to expire, we allow individual requests to "volunteer" to refresh the cache before it expires, based on a probability function. As the Time-To-Live (TTL) approaches zero, the probability of a refresh increases exponentially.

This technique allows us to:

Keep the key in Redis slightly longer than the logical expiry (Grace Period).
Serve slightly stale data to the herd while one request performs the update.
Eliminate the gap where the key does not exist.

The Algorithm (X-Fetch)

We utilize an algorithm often referred to as optimal probabilistic recomputation:

$$ currentTime + \Delta \cdot \beta \cdot \log(\text{rand()}) \geq \text{expiry} $$

Where:

$\Delta$: The time it takes to recompute the value (in ms).
$\beta$: A tuning factor (typically 1.0). Higher values trigger earlier refreshes.
$\log(\text{rand()})$: Adds the stochastic (random) distribution.

Implementation

We will implement a robust CacheManager in TypeScript. This implementation decouples the Logical TTL (when we want to refresh) from the Physical TTL (when Redis actually deletes the key).

Prerequisites

npm install ioredis
npm install -D typescript @types/node

The Code

import Redis from "ioredis";

interface CachePayload<T> {
  data: T;
  logicalExpiry: number; // Unix timestamp in ms
  computeTime: number;   // How long the last refresh took in ms
}

interface CacheOptions {
  ttlSeconds: number;
  beta?: number; // Default 1.0
}

export class ProbabilisticCache {
  private redis: Redis;

  constructor(redisConnectionString: string) {
    this.redis = new Redis(redisConnectionString, {
      // Production-grade settings
      enableOfflineQueue: false,
      maxRetriesPerRequest: 3,
    });
  }

  /**
   * Retrieves data from cache or recomputes it using 
   * Probabilistic Early Expiration (PER).
   */
  async getOrSet<T>(
    key: string,
    fetcher: () => Promise<T>,
    options: CacheOptions
  ): Promise<T> {
    const { ttlSeconds, beta = 1.0 } = options;
    
    // 1. Attempt to get the current value
    const cachedString = await this.redis.get(key);

    // 2. Hard Miss: Key doesn't exist at all.
    // We must compute immediately (blocking).
    if (!cachedString) {
      return this.recomputeAndCache(key, fetcher, ttlSeconds);
    }

    const payload: CachePayload<T> = JSON.parse(cachedString);
    const now = Date.now();
    
    // 3. Probabilistic Check
    // Formula: now - (delta * beta * log(rand)) >= expiry
    // If true, we are in the "danger zone" and should recompute.
    if (this.shouldRefresh(payload, beta, now)) {
      
      // Fire-and-forget recomputation to avoid blocking the current request.
      // In extremely strict consistency models, you might await this.
      // For stampede prevention, we return the 'stale' data and update in background.
      this.recomputeAndCache(key, fetcher, ttlSeconds).catch((err) => {
        console.error(`[Cache] Background refresh failed for ${key}`, err);
      });
    }

    // 4. Return the data (might be slightly stale, but highly available)
    return payload.data;
  }

  private shouldRefresh<T>(
    payload: CachePayload<T>, 
    beta: number, 
    now: number
  ): boolean {
    const delta = payload.computeTime;
    const expiry = payload.logicalExpiry;
    
    // Math.random() returns [0, 1). Math.log of that is negative infinity to 0.
    // As we get closer to expiry, the LHS becomes larger.
    const probabilisticDelta = delta * beta * Math.log(Math.random());
    
    return (now - probabilisticDelta) >= expiry;
  }

  private async recomputeAndCache<T>(
    key: string,
    fetcher: () => Promise<T>,
    ttlSeconds: number
  ): Promise<T> {
    const start = Date.now();
    const data = await fetcher();
    const duration = Date.now() - start;

    const payload: CachePayload<T> = {
      data,
      logicalExpiry: Date.now() + (ttlSeconds * 1000),
      computeTime: duration,
    };

    // PHYSICAL TTL: We set the Redis TTL longer than the logical TTL.
    // Standard practice is 2x or logical + fixed buffer.
    // This ensures data exists for other threads while one thread refreshes.
    const physicalTtl = Math.ceil(ttlSeconds * 2);

    await this.redis.set(
      key, 
      JSON.stringify(payload), 
      "EX", 
      physicalTtl
    );

    return data;
  }
  
  async disconnect() {
    await this.redis.quit();
  }
}

Usage Example

// simulated-db.ts
const cache = new ProbabilisticCache("redis://localhost:6379");

async function getExpensiveLeaderboard() {
  return await cache.getOrSet(
    "leaderboard:global",
    async () => {
      // Simulate expensive DB query (200ms)
      await new Promise(resolve => setTimeout(resolve, 200));
      return { topUsers: ["Alice", "Bob", "Charlie"], generatedAt: Date.now() };
    },
    { ttlSeconds: 60, beta: 1.0 }
  );
}

// In your API Controller
// Request 1: Triggers refresh if close to expiry
// Request 2-1000: Serve stale data instantly, no DB hits

The Explanation

Why does this work so effectively?

1. The Safety Net (Decoupled TTLs)

In the code above, logicalExpiry is when the application considers the data old (e.g., 60 seconds). However, we tell Redis to keep the key for 120 seconds (physicalTtl).

When the clock hits 60.1 seconds, the key still exists in Redis. Threads 2 through 5,000 will simply read the existing payload and return it. They don't crash the DB because they never experience a "Miss".

2. The Probabilistic Trigger

The magic lies in shouldRefresh.

If the key is fresh (e.g., created 1 second ago), shouldRefresh returns false 99.99% of the time.
As the current time approaches logicalExpiry, the term delta * beta * log(rand()) creates a randomized window.
The delta (compute time) factor is crucial. If the query takes longer to compute, we start rolling the dice earlier to ensure we have enough time to refresh before the logical expiry hits.

3. Non-Blocking (Fire and Forget)

Notice line 56: this.recomputeAndCache(...).catch(...). When a lucky request wins the lottery to refresh the cache, it doesn't necessarily have to block its own response (though it can). It triggers the refresh. Even if 10 requests win the lottery simultaneously, 10 DB queries are manageable. 5,000 are not.

Conclusion

Locking mechanisms treat symptoms; Probabilistic Early Expiration treats the architectural flaw of synchronous expiry.

By implementing PER, you change the traffic pattern on your database from a "cliff edge" (zero to infinity in 1ms) to a gentle, statistically distributed curve. The cost is serving data that is milliseconds stale—a tradeoff that any distributed system architect should happily accept in exchange for 100% availability.

Programming Tutorials

Search This Blog