Running Claude API Requests in Parallel

How to run multiple Claude API requests concurrently in Python and TypeScript. Covers asyncio, Promise.all, rate-limit handling, and queue-based concurrency patterns for Anthropic's API.

🔥 Launch tonight — Power Prompts PDF 50p (just 50p tonight)30 battle-tested Claude Code prompts · 8 pages · paste into CLAUDE.md · price reverts to £5

Anthropic's API is stateless — each request is independent and can run concurrently up to your rate limit. There is no built-in session or connection affinity. The main constraint is tokens per minute (TPM) and requests per minute (RPM), which scale with your usage tier.

Rate limit tiers (2026)

TierRPM (Sonnet)TPM (Sonnet)How to unlock
Tier 15040,000Default on sign-up
Tier 21,00080,000$40 spend
Tier 32,000160,000$200 spend
Tier 44,000400,000$400 spend

Hitting these limits returns HTTP 429 with a Retry-After header. Never busy-wait; read the header and sleep exactly that long.

Python: asyncio + semaphore pattern

import asyncio
import anthropic

client = anthropic.AsyncAnthropic()

async def call_claude(semaphore, prompt: str, idx: int) -> str:
    async with semaphore:
        response = await client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=512,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.content[0].text

async def run_batch(prompts: list[str], max_concurrency: int = 20):
    semaphore = asyncio.Semaphore(max_concurrency)
    tasks = [call_claude(semaphore, p, i) for i, p in enumerate(prompts)]
    return await asyncio.gather(*tasks)

prompts = ["Summarize: " + text for text in documents]
results = asyncio.run(run_batch(prompts, max_concurrency=20))

The semaphore caps in-flight requests. Start at 20 for Tier 2, 50 for Tier 4. If you're hitting 429s, lower it or add a token-bucket layer.

Python: rate-limit aware retry

import asyncio, time
from anthropic import RateLimitError

async def call_with_retry(client, prompt, max_retries=5):
    for attempt in range(max_retries):
        try:
            return await client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=512,
                messages=[{"role": "user", "content": prompt}]
            )
        except RateLimitError as e:
            retry_after = float(e.response.headers.get("retry-after", 2 ** attempt))
            await asyncio.sleep(retry_after)
    raise RuntimeError("Max retries exceeded")

TypeScript: Promise.all with concurrency limit

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function runWithConcurrency(
  tasks: (() => Promise)[],
  concurrency: number
): Promise {
  const results: string[] = [];
  const executing: Promise[] = [];

  for (const task of tasks) {
    const p = task().then(r => { results.push(r); });
    executing.push(p);
    if (executing.length >= concurrency) await Promise.race(executing);
    executing.splice(executing.findIndex(e => e === p), 1);
  }
  await Promise.all(executing);
  return results;
}

const tasks = prompts.map(prompt => async () => {
  const msg = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 512,
    messages: [{ role: "user", content: prompt }]
  });
  return msg.content[0].type === "text" ? msg.content[0].text : "";
});

const results = await runWithConcurrency(tasks, 20);

When to use Batch API instead

If your workload is offline (results not needed in real time), use the Batch API instead of parallel requests. Batch API runs at 50% of standard pricing with no rate-limit contention — you submit a JSONL file and poll for results. See Batch API savings guide.

PatternUse whenCost vs standard
Async parallel (this page)Latency matters, real-time pipeline
Batch APIOffline, OK to wait hours0.5×
Prompt caching + parallelShared system prompt across many requests0.1× on cached portions

Cost at scale

100k parallel requests × 2k input + 500 output tokens each (Sonnet 4.6): $0.675 total. The cost doesn't change whether you run them sequentially or concurrently — parallelism affects latency, not price. Use the Cost Calculator to model your workload.

Frequently asked questions

How many concurrent Claude API requests can I make?
There's no explicit concurrency limit — the constraint is tokens per minute (TPM) and requests per minute (RPM). On Tier 2 (after $40 spend) you get 1,000 RPM and 80,000 TPM for Sonnet 4.6. Run as many concurrent requests as you can without exceeding these limits; a semaphore of 20–50 is typical.
What happens when I hit the Claude API rate limit?
You get HTTP 429 with a Retry-After header specifying how many seconds to wait. Read this header and sleep exactly that long — do not exponential-backoff on 429 (you'll overshoot the recovery window). Use RateLimitError in the Anthropic SDK to catch it cleanly.
Is the Anthropic Batch API faster than parallel requests?
No — Batch API is designed for offline workloads with results available within 24 hours. It's 50% cheaper but has no latency guarantee. Parallel async requests are faster; Batch API is cheaper for non-time-sensitive work.
Can I parallelize Claude API calls in Node.js?
Yes. Use Promise.all for small batches or a concurrency-limiter like p-limit for larger ones. The @anthropic-ai/sdk client is fully async-friendly. The limiting factor is your rate tier, not Node.js concurrency.
Does running Claude API requests in parallel cost more?
No. The cost per request (input tokens × price + output tokens × price) is the same regardless of concurrency. Parallelism reduces wall-clock time, not per-request cost. Batch API is the way to reduce actual cost by 50%.

Free tools

Cost Calculator → Prompt-Pricing Recommender → Diff Summarizer → Skills Browser →

Related

Claude Opus 4.7 vs Sonnet 4.6 Pricing (2026 Comparison)How Much Does Claude Cost? (2026 API Pricing Guide)Claude Prompt Caching: 90% Cost Savings Explained (2026)Claude API Cost Calculator: Estimate Your Anthropic BillClaude vs GPT-4 Pricing: 2026 API Cost Comparison