A 529 Overloaded response from the Anthropic API means Claude itself is at capacity — not that your account hit a rate limit, and not that your request was malformed. It's the API's way of saying "try again in a moment." Unlike a 429 (which is per-account rate limiting and is your problem to fix), 529 is global to Anthropic's infrastructure and resolves on its own. But if you don't handle it deliberately, a 529 spike during a model launch or peak US working hours can take your service down.
The type: "overloaded_error" body field is the canonical signal — handle on that, not just the status code, since transient infrastructure errors can also surface as 5xx with different bodies.
When 529s spike
Model launch days. When Anthropic ships a new Opus or Sonnet, the first 24–72 hours see elevated 529 rates as capacity is provisioned.
US business hours. Roughly 14:00–22:00 UTC, the global pool is densest. Opus 4.7 typically sees more pressure than Sonnet or Haiku.
Long-context requests. Requests over 100k tokens are more expensive to serve and more likely to hit a 529 under load — they're queued differently.
Extended thinking. Thinking-mode requests hold capacity longer and are first to be shed under pressure.
The retry pattern
The official SDKs retry 529 (and 429, 408, 500, 502, 503, 504) automatically with exponential backoff — up to 2 retries by default. That's adequate for a brief overload but insufficient for a sustained one. Bump to 4–6 retries with jitter in production:
import { Anthropic, APIError } from "@anthropic-ai/sdk";
const client = new Anthropic({
maxRetries: 5,
timeout: 90_000,
});
async function askWithFallback(messages) {
try {
return await client.messages.create({
model: "claude-opus-4-7",
max_tokens: 1024,
messages,
});
} catch (err) {
if (err instanceof APIError && err.status === 529) {
// After SDK retries exhausted: fall back to Sonnet.
return await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages,
});
}
throw err;
}
}
Three patterns that scale
Tier fallback. If Opus 529s after retries, fall back to Sonnet. If Sonnet 529s, fall back to Haiku. Quality degrades gracefully; the user gets a response.
Provider fallback. If Claude is unreachable for >60s, route to a backup (different model or a cached response). Use a circuit breaker so a long Anthropic incident doesn't keep retrying forever.
Queue and replay. For non-latency-critical work, push failed requests to a retry queue (SQS, Redis Streams) with a 30–120 second visibility delay. Most 529 storms resolve within 5 minutes.
What not to do
Don't retry tight. Retrying every 100ms makes the overload worse. Backoff plus full jitter is essential.
Don't treat 529 as fatal. Surfacing "API down" to users on a transient 529 is unnecessary if you have a fallback path.
Don't ignore the Retry-After header. Anthropic sometimes includes hint values; respect them.
Don't retry 4xx (other than 429). A 400 means your request is broken; retrying just wastes capacity.
Pricing implication
Tier fallback costs you money: if 5% of Opus traffic falls back to Sonnet during peak hours, your effective bill changes. Sonnet costs $3/$15 vs Opus $15/$75 — so the fallback is cheaper, but your quality metrics may shift. Track the fallback rate and gate alerts on it crossing 10%.
529 (Overloaded) means Claude's infrastructure is at capacity globally — not that your account hit a rate limit. It's transient. The official Anthropic SDKs retry 529 automatically with exponential backoff; the recommended response is to let the SDK retry and add a tier-fallback (e.g. Opus → Sonnet) for sustained overloads.
How long does a Claude 529 overload usually last?
Most 529 spikes resolve within 30–300 seconds. Model launch days (when new Opus or Sonnet versions ship) and peak US business hours (14:00–22:00 UTC) see longer events, occasionally 5–15 minutes. Build a 60-second circuit breaker plus tier fallback to ride through these without user impact.
Should I retry on a Claude 529 error?
Yes — 529 is explicitly retriable. Use exponential backoff with full jitter and a retry ceiling of 4–6 attempts in production. The Anthropic SDKs do this automatically up to maxRetries (default 2). For latency-tolerant work, push to a retry queue rather than holding the request thread open.
Is 529 the same as a 429 rate limit on Claude?
No. 429 is per-account rate limiting — you're calling too fast and need to slow down or upgrade your tier. 529 is global overload at the model layer and is not specific to your account. Handle them differently: 429 means respect Retry-After and reduce your own concurrency; 529 means retry and fall back to a lower tier if it persists.