Handle Anthropic API errors correctly: rate limits (529), overload (529), timeouts, and auth errors. Python and Node.js retry patterns with exponential backoff.
The Anthropic Python and TypeScript SDKs include automatic retry with exponential backoff by default. This page covers what each error code means, when to retry, and how to customize retry behavior for production workloads.
| Status | Class | Meaning | Retry? |
|---|---|---|---|
| 401 | AuthenticationError | Invalid API key | No |
| 403 | PermissionDeniedError | Key lacks permission | No |
| 404 | NotFoundError | Bad URL or model name | No |
| 413 | BadRequestError | Request too large | No (reduce size) |
| 429 | RateLimitError | Too many requests | Yes (backoff) |
| 500 | InternalServerError | Anthropic server error | Yes |
| 529 | OverloadedError | Capacity exceeded | Yes (backoff) |
import anthropic
# SDK auto-retries on 429 and 5xx with exponential backoff (2 retries by default)
client = anthropic.Anthropic(
max_retries=4, # increase for batch workloads
timeout=60.0 # seconds (connect + read)
)
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
import anthropic
from anthropic import APIStatusError, RateLimitError, APITimeoutError
client = anthropic.Anthropic()
try:
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
except RateLimitError as e:
print(f"Rate limited. Retry-After: {e.response.headers.get('retry-after')} s")
# implement wait logic here
except APITimeoutError:
print("Request timed out — increase timeout or reduce max_tokens")
except anthropic.OverloadedError:
print("API overloaded (529) — back off and retry")
except APIStatusError as e:
print(f"API error {e.status_code}: {e.message}")
import time, random, anthropic
client = anthropic.Anthropic(max_retries=0) # handle manually
def call_with_backoff(messages, max_attempts=5):
for attempt in range(max_attempts):
try:
return client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=messages
)
except (anthropic.RateLimitError, anthropic.OverloadedError) as e:
if attempt == max_attempts - 1:
raise
wait = (2 ** attempt) + random.uniform(0, 1)
print(f"Attempt {attempt+1} failed ({type(e).__name__}). Waiting {wait:.1f}s...")
time.sleep(wait)
result = call_with_backoff([{"role": "user", "content": "Hello"}])
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ maxRetries: 4, timeout: 60_000 });
try {
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello" }]
});
console.log(response.content[0].text);
} catch (err) {
if (err instanceof Anthropic.RateLimitError) {
const retryAfter = err.headers?.["retry-after"];
console.error(`Rate limited. Retry after ${retryAfter}s`);
} else if (err instanceof Anthropic.APITimeoutError) {
console.error("Timeout — reduce max_tokens or increase timeout");
} else if (err instanceof Anthropic.OverloadedError) {
console.error("Overloaded (529) — back off and retry");
} else {
throw err;
}
}
Anthropic enforces requests per minute (RPM) and tokens per minute (TPM) limits. Free-tier users get the lowest limits; paying users get higher defaults that increase with usage history. Check your current limits in the Anthropic Console under "Limits".
For batch workloads, the Message Batches API has a separate, higher quota and costs 50% less. Pair it with the Claude Cost Calculator to size your budget.