Skip to main content

Handling '529 Overloaded' and '429 Rate Limit' Errors in Anthropic API

 It is 2:00 AM. Your monitoring dashboard lights up with a spike in 5xx errors. Your LLM-powered feature—the core of your application—is failing. Upon inspecting the logs, you don't see the standard "Service Unavailable" errors. Instead, you are met with 529 Overloaded or 429 Too Many Requests.

For Site Reliability Engineers (SREs) and Backend Developers integrating the Anthropic API (Claude), these two errors are the primary adversaries of uptime. While the official SDKs provide basic retry mechanisms, they are often insufficient for high-throughput production environments facing genuine traffic spikes.

This guide details the root causes of these errors and provides production-grade, copy-pasteable implementation patterns for Node.js and Python to handle them gracefully using Exponential Backoff with Jitter.

Root Cause Analysis: Why Your Requests Fail

To fix the crash, we must understand the architecture of the failure.

The 429 Error (Rate Limit Exceeded)

This is a client-side volume issue. Anthropic enforces limits on Requests Per Minute (RPM), Tokens Per Minute (TPM), and concurrent connections to ensure fair usage.

When you hit a 429, the API is telling you to stop. It often includes a retry-after header. If your application immediately retries without respecting this signal, you risk getting your API key temporarily banned or experiencing prolonged lockouts.

The 529 Error (System Overloaded)

This is specific to Anthropic's infrastructure. Unlike a generic 500 Internal Server Error (which implies a bug or crash), a 529 indicates that the specific model's compute capacity is temporarily saturated.

During peak hours, compute availability fluctuates. If you treat a 529 like a hard failure and drop the request, you degrade user experience unnecessarily. If you retry immediately in a tight loop, you contribute to the "thundering herd" problem, worsening the congestion for everyone.

The Strategy: Exponential Backoff with Jitter

The only robust solution for both 429 and 529 errors is Exponential Backoff with Jitter.

  1. Exponential Backoff: Wait longer between each failed retry (e.g., 1s, 2s, 4s, 8s). This clears the immediate congestion.
  2. Jitter: Add a random variance to the wait time.

Without Jitter, if 1,000 requests fail at 12:00:00, they will all retry exactly at 12:00:01, then 12:00:03. This synchronization keeps the server hammered. Jitter desynchronizes these requests, smoothing out the traffic spike.

Solution 1: Node.js (TypeScript)

In the Node.js ecosystem, we want to intercept the API call and wrap it in a retry loop that detects specific error codes. We will use TypeScript for type safety and the official @anthropic-ai/sdk.

Prerequisites

Ensure you have the SDK installed:

npm install @anthropic-ai/sdk

The Implementation

This utility function wraps the completion call. It parses the error type and applies a randomized backoff strategy.

import Anthropic from '@anthropic-ai/sdk';

// Configuration for reliability
const MAX_RETRIES = 5;
const BASE_DELAY_MS = 1000;
const MAX_DELAY_MS = 32000;

/**
 * Calculates delay with Full Jitter strategy.
 * Formula: random_between(0, min(cap, base * 2 ** attempt))
 */
const calculateDelay = (attempt: number): number => {
  const exponentialDelay = BASE_DELAY_MS * Math.pow(2, attempt);
  const cappedDelay = Math.min(exponentialDelay, MAX_DELAY_MS);
  // Apply full jitter: random value between 0 and cappedDelay
  return Math.floor(Math.random() * cappedDelay);
};

const sleep = (ms: number) => new Promise((resolve) => setTimeout(resolve, ms));

export async function safeMessageCreate(
  client: Anthropic,
  params: Anthropic.MessageCreateParams
): Promise<Anthropic.Message> {
  let attempt = 0;

  while (true) {
    try {
      // Attempt the API call
      return await client.messages.create(params);
    } catch (error) {
      attempt++;

      // If we've exhausted retries, throw the error to the caller
      if (attempt > MAX_RETRIES) {
        throw error;
      }

      // Check if error is retry-able (529 or 429)
      if (error instanceof Anthropic.APIError) {
        const isOverloaded = error.status === 529;
        const isRateLimited = error.status === 429;

        if (isOverloaded || isRateLimited) {
          const delay = calculateDelay(attempt);
          
          console.warn(
            `[Anthropic API] Error ${error.status}. Retrying attempt ${attempt}/${MAX_RETRIES} in ${delay}ms...`
          );
          
          // If Rate Limit, check for Retry-After header
          // Note: Anthropic SDK headers are often accessible via error.headers
          const retryAfter = error.headers?.['retry-after'];
          const finalWait = retryAfter 
            ? Math.max(parseInt(retryAfter) * 1000, delay) 
            : delay;

          await sleep(finalWait);
          continue;
        }
      }

      // If error is not transient (e.g. 400 Bad Request), throw immediately
      throw error;
    }
  }
}

// Usage Example
async function main() {
  const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

  try {
    const response = await safeMessageCreate(anthropic, {
      model: "claude-3-opus-20240229",
      max_tokens: 1024,
      messages: [{ role: "user", content: "Explain quantum entanglement." }],
    });
    console.log(response.content);
  } catch (err) {
    console.error("Final failure after retries:", err);
  }
}

Solution 2: Python (Async)

For Python backend services (FastAPI, Django, Flask), we typically use the async client for non-blocking I/O. We will implement a decorator pattern. This makes it easy to apply retry logic to any function calling the LLM.

Prerequisites

pip install anthropic

The Implementation

We will handle anthropic.RateLimitError and anthropic.APIStatusError (which covers 529).

import asyncio
import random
import logging
import time
from functools import wraps
from typing import Callable, Any

from anthropic import AsyncAnthropic, RateLimitError, APIStatusError

# Configure Logging
logger = logging.getLogger("anthropic_retry")
logging.basicConfig(level=logging.INFO)

def anthropic_retry(
    max_retries: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 32.0
):
    """
    Decorator for retry logic with Exponential Backoff and Jitter.
    Targets 429 (Rate Limit) and 529 (Overloaded).
    """
    def decorator(func: Callable[..., Any]):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            attempt = 0
            while True:
                try:
                    return await func(*args, **kwargs)
                
                except (RateLimitError, APIStatusError) as e:
                    # Reraise 4xx errors that aren't 429 (e.g., 400 Bad Request, 401 Unauthorized)
                    if isinstance(e, APIStatusError) and e.status_code not in [429, 529]:
                        raise e

                    attempt += 1
                    if attempt > max_retries:
                        logger.error(f"Max retries reached for Anthropic API. Last error: {e}")
                        raise e
                    
                    # Calculate Delay: Base * 2^attempt + Jitter
                    delay = min(base_delay * (2 ** attempt), max_delay)
                    jitter = random.uniform(0, delay)
                    final_wait = jitter

                    # Respect 'retry-after' header if present in 429
                    if isinstance(e, RateLimitError) and e.response is not None:
                        retry_header = e.response.headers.get('retry-after')
                        if retry_header:
                            try:
                                header_wait = float(retry_header)
                                # Add small buffer to header wait time
                                final_wait = max(header_wait + 0.5, final_wait)
                            except ValueError:
                                pass

                    error_type = "Overloaded" if e.status_code == 529 else "Rate Limited"
                    logger.warning(
                        f"{error_type} (Status {e.status_code}). "
                        f"Retrying attempt {attempt}/{max_retries} in {final_wait:.2f}s."
                    )
                    
                    await asyncio.sleep(final_wait)
        return wrapper
    return decorator

# Usage Example
client = AsyncAnthropic()

@anthropic_retry(max_retries=5)
async def generate_content(prompt: str):
    message = await client.messages.create(
        model="claude-3-sonnet-20240229",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    return message

async def main():
    try:
        response = await generate_content("Write a haiku about recursion.")
        print(response.content[0].text)
    except Exception as e:
        print(f"Application Error: {e}")

if __name__ == "__main__":
    asyncio.run(main())

Deep Dive: Why This Code Works

1. Handling the "Thundering Herd"

Notice the usage of random.uniform in Python and Math.random() in Node.js. If your application sends a batch of 500 requests and the API overloads, a standard retry loop (e.g., wait 5 seconds) would cause all 500 requests to hit the server again at the exact same millisecond 5 seconds later.

By adding randomness (Jitter), we spread the retry load across a time window. This increases the statistical probability of your request finding an open slot on Anthropic's load balancer.

2. Respecting Retry-After

While exponential backoff is excellent for 529 errors (where the server doesn't know when it will be free), 429 errors are deterministic. If Anthropic sends a header retry-after: 10, retrying in 2 seconds is futile. Both implementations above inspect headers to prioritize the server's explicit instruction over our calculated backoff.

3. Idempotency Considerations

These retry mechanisms assume the operation is idempotent. In the context of LLM Chat Completions, retrying a messages.create call is generally safe. However, be cautious if your LLM calls trigger side effects (like writing to a database) before the API responds. Ensure your retry logic only wraps the API call, not the surrounding business logic.

Common Pitfalls and Production Tips

The Timeout Trap

Sometimes, the API won't return a 429 or 529; it will simply hang. Ensure you configure a client-side timeout in the SDK.

  • Node: new Anthropic({ timeout: 60000 })
  • Python: AsyncAnthropic(timeout=60.0) If a request times out, you should generally treat it as a retry-able 5xx error.

Context Window Errors (400)

Do not retry 400 Bad Request errors. Specifically, invalid_request_error usually means you exceeded the context window (max tokens). Retrying this will fail 100% of the time and burn CPU cycles. The code examples above specifically filter for 429 and 529 status codes to avoid this loop.

Conclusion

Reliability in LLM applications isn't about preventing errors—it's about handling them gracefully. By implementing Exponential Backoff with Jitter, you transform catastrophic 529 crashes into minor latency spikes that your users might not even notice.

Whether you are using Node.js or Python, the key is to be aggressive with your backoff times but gentle with your concurrency. Copy the wrappers above into your utility library today to harden your production environment against peak-hour instability.