Claude API Error Handling, Rate Limits & Retry Logic

Handle Anthropic API errors correctly: rate limits (529), overload (529), timeouts, and auth errors. Python and Node.js retry patterns with exponential backoff.

🔥 Launch tonight — Power Prompts PDF 50p (just 50p tonight)30 battle-tested Claude Code prompts · 8 pages · paste into CLAUDE.md · price reverts to £5

The Anthropic Python and TypeScript SDKs include automatic retry with exponential backoff by default. This page covers what each error code means, when to retry, and how to customize retry behavior for production workloads.

HTTP error codes

StatusClassMeaningRetry?
401AuthenticationErrorInvalid API keyNo
403PermissionDeniedErrorKey lacks permissionNo
404NotFoundErrorBad URL or model nameNo
413BadRequestErrorRequest too largeNo (reduce size)
429RateLimitErrorToo many requestsYes (backoff)
500InternalServerErrorAnthropic server errorYes
529OverloadedErrorCapacity exceededYes (backoff)

Python: default retry (built-in)

import anthropic

# SDK auto-retries on 429 and 5xx with exponential backoff (2 retries by default)
client = anthropic.Anthropic(
    max_retries=4,   # increase for batch workloads
    timeout=60.0     # seconds (connect + read)
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

Python: catch specific errors

import anthropic
from anthropic import APIStatusError, RateLimitError, APITimeoutError

client = anthropic.Anthropic()

try:
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError as e:
    print(f"Rate limited. Retry-After: {e.response.headers.get('retry-after')} s")
    # implement wait logic here
except APITimeoutError:
    print("Request timed out — increase timeout or reduce max_tokens")
except anthropic.OverloadedError:
    print("API overloaded (529) — back off and retry")
except APIStatusError as e:
    print(f"API error {e.status_code}: {e.message}")

Python: custom exponential backoff

import time, random, anthropic

client = anthropic.Anthropic(max_retries=0)  # handle manually

def call_with_backoff(messages, max_attempts=5):
    for attempt in range(max_attempts):
        try:
            return client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=messages
            )
        except (anthropic.RateLimitError, anthropic.OverloadedError) as e:
            if attempt == max_attempts - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"Attempt {attempt+1} failed ({type(e).__name__}). Waiting {wait:.1f}s...")
            time.sleep(wait)

result = call_with_backoff([{"role": "user", "content": "Hello"}])

Node.js: error handling

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ maxRetries: 4, timeout: 60_000 });

try {
  const response = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    messages: [{ role: "user", content: "Hello" }]
  });
  console.log(response.content[0].text);
} catch (err) {
  if (err instanceof Anthropic.RateLimitError) {
    const retryAfter = err.headers?.["retry-after"];
    console.error(`Rate limited. Retry after ${retryAfter}s`);
  } else if (err instanceof Anthropic.APITimeoutError) {
    console.error("Timeout — reduce max_tokens or increase timeout");
  } else if (err instanceof Anthropic.OverloadedError) {
    console.error("Overloaded (529) — back off and retry");
  } else {
    throw err;
  }
}

Rate limit tiers

Anthropic enforces requests per minute (RPM) and tokens per minute (TPM) limits. Free-tier users get the lowest limits; paying users get higher defaults that increase with usage history. Check your current limits in the Anthropic Console under "Limits".

For batch workloads, the Message Batches API has a separate, higher quota and costs 50% less. Pair it with the Claude Cost Calculator to size your budget.

Frequently asked questions

What does a 529 error mean from the Anthropic API?
529 OverloadedError means Anthropic's servers are temporarily at capacity. It is different from 429 RateLimitError (which means you've hit your account quota). Both are safe to retry with exponential backoff. The SDK's built-in retry logic handles both automatically.
How many retries does the Anthropic SDK do by default?
The Python and Node.js SDKs retry up to 2 times by default on 429 and 5xx errors. You can increase this with max_retries=N (Python) or maxRetries: N (Node.js). Set it to 0 to disable retries and handle them yourself.
How do I read the retry-after header from a 429 error?
In Python: err.response.headers.get('retry-after'). In Node.js: err.headers['retry-after']. The value is the number of seconds to wait before retrying. If the header is absent, use exponential backoff starting at 1s.
What timeout should I set for the Claude API?
The SDK default is 600s (10 min) for Python and 10 min for Node.js. For interactive workloads, set timeout=30.0 (30s) to fail fast. For large context windows or long generations (max_tokens > 4096), keep it at 120–300s. For streaming, the timeout applies to connection setup only.

Free tools

Cost Calculator → Prompt-Pricing Recommender → Diff Summarizer → Skills Browser →

Related

Claude Opus 4.7 vs Sonnet 4.6 Pricing (2026 Comparison)How Much Does Claude Cost? (2026 API Pricing Guide)Claude Prompt Caching: 90% Cost Savings Explained (2026)Claude API Cost Calculator: Estimate Your Anthropic BillClaude vs GPT-4 Pricing: 2026 API Cost Comparison