Claude API Parallel Requests: Concurrency, Rate Limits & Queue Patterns (2026)

How to run multiple Claude API requests concurrently in Python and TypeScript. Covers asyncio, Promise.all, rate-limit handling, and queue-based concurrency patterns for Anthropic's API.

Anthropic's API is stateless — each request is independent and can run concurrently up to your rate limit. There is no built-in session or connection affinity. The main constraint is tokens per minute (TPM) and requests per minute (RPM), which scale with your usage tier.

Rate limit tiers (2026)

Tier	RPM (Sonnet)	TPM (Sonnet)	How to unlock
Tier 1	50	40,000	Default on sign-up
Tier 2	1,000	80,000	$40 spend
Tier 3	2,000	160,000	$200 spend
Tier 4	4,000	400,000	$400 spend

Hitting these limits returns HTTP 429 with a Retry-After header. Never busy-wait; read the header and sleep exactly that long.

Python: asyncio + semaphore pattern

The semaphore caps in-flight requests. Start at 20 for Tier 2, 50 for Tier 4. If you're hitting 429s, lower it or add a token-bucket layer.

Python: rate-limit aware retry

TypeScript: Promise.all with concurrency limit

When to use Batch API instead

If your workload is offline (results not needed in real time), use the Batch API instead of parallel requests. Batch API runs at 50% of standard pricing with no rate-limit contention — you submit a JSONL file and poll for results. See Batch API savings guide.

Cost at scale

Pattern	Use when	Cost vs standard
Async parallel (this page)	Latency matters, real-time pipeline	1×
Batch API	Offline, OK to wait hours	0.5×
Prompt caching + parallel	Shared system prompt across many requests	0.1× on cached portions

100k parallel requests × 2k input + 500 output tokens each (Sonnet 4.6): $0.675 total. The cost doesn't change whether you run them sequentially or concurrently — parallelism affects latency, not price. Use the Cost Calculator to model your workload.

Frequently asked questions

How many concurrent Claude API requests can I make?

There's no explicit concurrency limit — the constraint is tokens per minute (TPM) and requests per minute (RPM). On Tier 2 (after $40 spend) you get 1,000 RPM and 80,000 TPM for Sonnet 4.6. Run as many concurrent requests as you can without exceeding these limits; a semaphore of 20–50 is typical.

What happens when I hit the Claude API rate limit?

You get HTTP 429 with a Retry-After header specifying how many seconds to wait. Read this header and sleep exactly that long — do not exponential-backoff on 429 (you'll overshoot the recovery window). Use RateLimitError in the Anthropic SDK to catch it cleanly.

Is the Anthropic Batch API faster than parallel requests?

No — Batch API is designed for offline workloads with results available within 24 hours. It's 50% cheaper but has no latency guarantee. Parallel async requests are faster; Batch API is cheaper for non-time-sensitive work.

Can I parallelize Claude API calls in Node.js?

Yes. Use Promise.all for small batches or a concurrency-limiter like p-limit for larger ones. The @anthropic-ai/sdk client is fully async-friendly. The limiting factor is your rate tier, not Node.js concurrency.

Does running Claude API requests in parallel cost more?

No. The cost per request (input tokens × price + output tokens × price) is the same regardless of concurrency. Parallelism reduces wall-clock time, not per-request cost. Batch API is the way to reduce actual cost by 50%.

Free tools

Running Claude API Requests in Parallel