Skip to main content

Implementing Distributed Rate Limiting in REST APIs Using Redis

 Scaling a backend to handle millions of requests is a significant architectural milestone. However, operating a distributed API architecture introduces an immediate vulnerability: coordinated abuse. When malicious actors scrape endpoints, enumerate data, or launch layer 7 DDoS attacks, local memory limits provide zero protection.

If you rely on per-instance, in-memory rate limiting within a load-balanced environment, you are effectively multiplying your request limits by the number of active server instances. A client allowed 100 requests per minute can consume 100 requests per node. To enforce a global, strict limit across a cluster, the architecture requires a centralized, high-performance state store.

Redis is the industry standard for this task due to its microsecond latency and single-threaded execution model.

The Root Cause: Local State and Race Conditions

In a monolithic architecture, a standard middleware tracks IP addresses and request counts directly in RAM. In a microservices environment, requests from the same client are routed unpredictably across multiple pods or containers.

Moving the state to a centralized database seems like the obvious solution, but executing standard GET and SET commands introduces a critical race condition. If Node A and Node B receive a request from the same IP at the exact same millisecond, both instances might read a remaining token count of 10. Both nodes decrement the value to 9 and write it back.

Instead of deducting two requests, the system only deducts one. At scale, this race condition completely neutralizes your defense against concurrent burst attacks, entirely defeating the purpose of preventing API DDoS events.

The Solution: Atomic Operations with Lua

To eliminate race conditions, the read, calculate, and write operations must happen atomically. Redis supports server-side Lua scripting, which blocks all other operations while the script executes.

Furthermore, we will implement the token bucket algorithm API. Unlike a fixed-window counter that aggressively resets at the top of the minute, the token bucket allows for controlled bursts of traffic while enforcing a smooth, sustained rate over time.

1. The Redis Lua Script

To avoid client-side clock drift issues across disparate microservices, we utilize the Redis TIME command directly inside the Lua script. This ensures the token refill logic relies on a single source of truth for time.

-- token_bucket.lua
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2]) -- tokens added per second
local requested = tonumber(ARGV[3])

-- Fetch current time directly from Redis to avoid distributed clock drift
local redis_time = redis.call('TIME')
local now_sec = tonumber(redis_time[1])
local now_usec = tonumber(redis_time[2])
local now = now_sec + (now_usec / 1000000)

-- Retrieve the current state of the bucket
local tokens = tonumber(redis.call('HGET', key, 'tokens'))
local last_refill = tonumber(redis.call('HGET', key, 'last_refill'))

if tokens == nil then
    tokens = capacity
    last_refill = now
end

-- Calculate refilled tokens based on elapsed time
local elapsed = math.max(0, now - last_refill)
local refilled = elapsed * refill_rate
tokens = math.min(capacity, tokens + refilled)

local allowed = 0
if tokens >= requested then
    tokens = tokens - requested
    allowed = 1
end

-- Persist the new state
redis.call('HSET', key, 'tokens', tokens, 'last_refill', now)

-- Set a TTL to prevent memory leaks (time to fully refill the bucket)
local ttl = math.ceil(capacity / refill_rate)
redis.call('EXPIRE', key, ttl)

-- Return allowed status and remaining tokens (as string to prevent float truncation in Redis)
return { allowed, tostring(tokens) }

2. The TypeScript Express Middleware

The following Node.js implementation uses the ioredis library, which provides native support for defining and caching custom Lua commands.

import { Request, Response, NextFunction } from 'express';
import Redis from 'ioredis';

// Initialize Redis. Disable offline queue to fail fast during outages.
const redisClient = new Redis({
    host: process.env.REDIS_HOST || '127.0.0.1',
    port: Number(process.env.REDIS_PORT) || 6379,
    enableOfflineQueue: false,
});

// Define the custom Lua command
redisClient.defineCommand('rateLimit', {
    numberOfKeys: 1,
    lua: `
        local key = KEYS[1]
        local capacity = tonumber(ARGV[1])
        local refill_rate = tonumber(ARGV[2])
        local requested = tonumber(ARGV[3])

        local redis_time = redis.call('TIME')
        local now = tonumber(redis_time[1]) + (tonumber(redis_time[2]) / 1000000)

        local tokens = tonumber(redis.call('HGET', key, 'tokens'))
        local last_refill = tonumber(redis.call('HGET', key, 'last_refill'))

        if tokens == nil then
            tokens = capacity
            last_refill = now
        end

        local elapsed = math.max(0, now - last_refill)
        tokens = math.min(capacity, tokens + (elapsed * refill_rate))

        local allowed = 0
        if tokens >= requested then
            tokens = tokens - requested
            allowed = 1
        end

        redis.call('HSET', key, 'tokens', tokens, 'last_refill', now)
        redis.call('EXPIRE', key, math.ceil(capacity / refill_rate))

        return { allowed, tostring(tokens) }
    `,
});

// Extend the Redis type to include our custom command
declare module 'ioredis' {
    interface Redis {
        rateLimit(key: string, capacity: number, refillRate: number, requested: number): Promise<[number, string]>;
    }
}

export const rateLimiterMiddleware = (capacity: number, refillRate: number) => {
    return async (req: Request, res: Response, next: NextFunction): Promise<void> => {
        // Fallback to a generic identifier if IP is missing behind proxies
        const identifier = req.ip || req.headers['x-forwarded-for'] || 'anonymous';
        const redisKey = `rate_limit:${identifier}`;

        try {
            const [allowed, remainingStr] = await redisClient.rateLimit(
                redisKey,
                capacity,
                refillRate,
                1 // Consuming 1 token per request
            );

            const remaining = Math.floor(parseFloat(remainingStr));

            res.setHeader('X-RateLimit-Limit', capacity);
            res.setHeader('X-RateLimit-Remaining', Math.max(0, remaining));

            if (allowed === 0) {
                res.status(429).json({
                    error: 'Too Many Requests',
                    message: 'You have exceeded your request allocation. Please slow down.'
                });
                return;
            }

            next();
        } catch (error) {
            // Log the error centrally
            console.error('Rate Limiter Error:', error);
            
            // Fail Open Strategy: Do not drop traffic if Redis crashes
            next();
        }
    };
};

3. Applying the Middleware

Integrate the middleware into your Express application by applying it globally or binding it to specific intensive routes.

import express from 'express';
import { rateLimiterMiddleware } from './middleware/rateLimiter';

const app = express();

// Apply a global limit: 100 capacity, replenishing at 2 tokens per second
app.use(rateLimiterMiddleware(100, 2));

app.get('/api/resource', (req, res) => {
    res.json({ data: 'Protected resource accessed successfully.' });
});

app.listen(3000, () => console.log('Server running on port 3000'));

Deep Dive: Why This Architecture Works

Using redisClient.defineCommand leverages the EVALSHA command under the hood. When the application starts, ioredis hashes the Lua script and loads it into Redis's script cache. Subsequent requests only send the SHA1 hash along with the arguments, rather than transmitting the entire script text. This minimizes network overhead and bandwidth.

Because Redis executes Lua scripts atomically, the event loop blocks other commands until the script finishes. This strictly serializes concurrent requests for the same rate_limit:{ip} key. The calculations for time elapsed, token replenishment, and deduction are guaranteed to be evaluated against the most current state, effectively nullifying the concurrency race conditions found in standard REST API rate limiting Redis implementations.

Handling Common Pitfalls

Fail Open vs. Fail Closed

In the catch block of the middleware, the next() function is called when a Redis exception occurs. This is known as a "Fail Open" strategy. If the Redis cluster goes down or experiences a network partition, the API remains available (albeit unprotected). Failing closed (returning a 500 Internal Server Error) protects your downstream services but creates a global outage for your users. In highly available microservices, failing open is standard practice unless the downstream services are guaranteed to cascade and crash without rate limits.

Reverse Proxies and IP Spoofing

When deploying behind AWS ALB, Nginx, or Cloudflare, req.ip will likely return the IP address of the load balancer. The middleware attempts to read x-forwarded-for, but relying blindly on headers can be dangerous. Attackers can inject fake x-forwarded-for headers to bypass limits. Always configure your framework (e.g., app.set('trust proxy', 1)) to trust only the direct upstream reverse proxy.

Memory Management and Keyspace

Every unique IP address generates a Redis key. During a distributed layer 7 attack involving millions of rotated IP addresses, memory consumption will spike. The EXPIRE command at the end of the Lua script ensures keys exist only as long as it takes for the bucket to refill to maximum capacity. Once full, tracking the state is redundant, and Redis gracefully evicts the key, protecting your cluster from out-of-memory (OOM) crashes.

Conclusion

Implementing a robust distributed API architecture requires shifting state management from individual application nodes to a high-speed, centralized layer. By combining Redis Lua scripting with the token bucket algorithm API, you enforce strict, exact quotas across a boundless number of load-balanced instances. This design guarantees atomic consistency, handles request bursts elegantly, and fortifies your backend systems against sophisticated abuse.