How to run multiple Claude API requests concurrently in Python and TypeScript. Covers asyncio, Promise.all, rate-limit handling, and queue-based concurrency patterns for Anthropic's API.
Anthropic's API is stateless — each request is independent and can run concurrently up to your rate limit. There is no built-in session or connection affinity. The main constraint is tokens per minute (TPM) and requests per minute (RPM), which scale with your usage tier.
| Tier | RPM (Sonnet) | TPM (Sonnet) | How to unlock |
|---|---|---|---|
| Tier 1 | 50 | 40,000 | Default on sign-up |
| Tier 2 | 1,000 | 80,000 | $40 spend |
| Tier 3 | 2,000 | 160,000 | $200 spend |
| Tier 4 | 4,000 | 400,000 | $400 spend |
Hitting these limits returns HTTP 429 with a Retry-After header. Never busy-wait; read the header and sleep exactly that long.
import asyncio
import anthropic
client = anthropic.AsyncAnthropic()
async def call_claude(semaphore, prompt: str, idx: int) -> str:
async with semaphore:
response = await client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
async def run_batch(prompts: list[str], max_concurrency: int = 20):
semaphore = asyncio.Semaphore(max_concurrency)
tasks = [call_claude(semaphore, p, i) for i, p in enumerate(prompts)]
return await asyncio.gather(*tasks)
prompts = ["Summarize: " + text for text in documents]
results = asyncio.run(run_batch(prompts, max_concurrency=20))
The semaphore caps in-flight requests. Start at 20 for Tier 2, 50 for Tier 4. If you're hitting 429s, lower it or add a token-bucket layer.
import asyncio, time
from anthropic import RateLimitError
async def call_with_retry(client, prompt, max_retries=5):
for attempt in range(max_retries):
try:
return await client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
messages=[{"role": "user", "content": prompt}]
)
except RateLimitError as e:
retry_after = float(e.response.headers.get("retry-after", 2 ** attempt))
await asyncio.sleep(retry_after)
raise RuntimeError("Max retries exceeded")
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function runWithConcurrency(
tasks: (() => Promise)[],
concurrency: number
): Promise {
const results: string[] = [];
const executing: Promise[] = [];
for (const task of tasks) {
const p = task().then(r => { results.push(r); });
executing.push(p);
if (executing.length >= concurrency) await Promise.race(executing);
executing.splice(executing.findIndex(e => e === p), 1);
}
await Promise.all(executing);
return results;
}
const tasks = prompts.map(prompt => async () => {
const msg = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 512,
messages: [{ role: "user", content: prompt }]
});
return msg.content[0].type === "text" ? msg.content[0].text : "";
});
const results = await runWithConcurrency(tasks, 20);
If your workload is offline (results not needed in real time), use the Batch API instead of parallel requests. Batch API runs at 50% of standard pricing with no rate-limit contention — you submit a JSONL file and poll for results. See Batch API savings guide.
| Pattern | Use when | Cost vs standard |
|---|---|---|
| Async parallel (this page) | Latency matters, real-time pipeline | 1× |
| Batch API | Offline, OK to wait hours | 0.5× |
| Prompt caching + parallel | Shared system prompt across many requests | 0.1× on cached portions |
100k parallel requests × 2k input + 500 output tokens each (Sonnet 4.6): $0.675 total. The cost doesn't change whether you run them sequentially or concurrently — parallelism affects latency, not price. Use the Cost Calculator to model your workload.