Anthropic's Claude API enforces rate limits per usage tier. New accounts start at Tier 1; tiers level up automatically as you spend, unlocking higher per-minute and per-day quotas.
The two limit dimensions
Requests per minute (RPM). Independent of token volume.
Tokens per minute (TPM). Sum of input + output tokens per minute. Per model.
Tokens per day (TPD). Daily cap. Per model.
Tier progression (typical 2026 values — confirm in console)
Tier
Trigger
Approx RPM
Approx TPM
Tier 1
Account creation
50
40k–50k
Tier 2
$40 spent + 7d
1,000
80k–400k
Tier 3
$200 spent + 7d
2,000
160k–800k
Tier 4
$400 spent + 14d
4,000
400k–2M
Numbers are illustrative — Anthropic adjusts tier thresholds. Check the rate-limits page in your console for live values.
Hitting a limit returns HTTP 429
The response includes retry-after headers. Implement exponential backoff with jitter — most SDKs do this automatically.
How to scale past Tier 4
Sustained spend automatically escalates you to higher tiers.
For enterprise needs, contact Anthropic sales for custom limits.
The Batch API has separate (much higher) limits — use it for any bulk workload.
Strategy: stay under the cap
Cache aggressively. Cached reads count against your TPM at the cached-read rate but are billed at 10% of input — they reduce both cost and (effectively) limit pressure.
Route low-difficulty traffic to Haiku. Haiku has separate (higher) TPM limits than Sonnet/Opus on each tier.
Use Batch for the long tail. Non-real-time work shouldn't compete with your real-time quota.
Anthropic levels accounts up automatically based on sustained spend and account age (e.g., $40 + 7 days to reach Tier 2, $200 + 7 days for Tier 3). For limits beyond Tier 4, contact Anthropic sales. You cannot manually request a higher tier in-console before hitting the spend threshold.
What happens when I hit a rate limit?
The API returns HTTP 429 with a retry-after header indicating when to retry. Implement exponential backoff with jitter — the official Anthropic SDKs do this automatically. Repeated 429s suggest you should upgrade tier, route to a lower-volume model, or shift work to the Batch API.
Does the Batch API count against my real-time rate limit?
No. The Batch API has separate (much higher) limits than the real-time API. Shifting bulk workloads to Batch frees up real-time quota for user-facing requests — and saves 50% on cost too.