Using Claude API with Python asyncio (AsyncAnthropic)

Use AsyncAnthropic to call Claude in async Python applications. Concurrent requests with asyncio.gather, async streaming, FastAPI integration, and performance tips.

🔥 Launch tonight — Power Prompts PDF 50p (just 50p tonight)30 battle-tested Claude Code prompts · 8 pages · paste into CLAUDE.md · price reverts to £5

The Anthropic Python SDK ships an AsyncAnthropic client with the same interface as the sync version, but all methods return coroutines. Use it whenever your code runs inside asyncio — FastAPI, Starlette, or any event-loop-based server.

Basic async call

import asyncio
import anthropic

async def main():
    client = anthropic.AsyncAnthropic()

    message = await client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": "What is the capital of France?"}]
    )
    print(message.content[0].text)

asyncio.run(main())

Concurrent requests with asyncio.gather

This is the main reason to use async — fire N requests simultaneously and collect results, rather than waiting for each one sequentially.

import asyncio, anthropic

client = anthropic.AsyncAnthropic()

async def summarize(text: str, label: str) -> str:
    msg = await client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=256,
        messages=[{"role": "user", "content": f"Summarise in 2 sentences: {text}"}]
    )
    return f"{label}: {msg.content[0].text}"

async def main():
    docs = [
        ("Long document one...", "doc-1"),
        ("Long document two...", "doc-2"),
        ("Long document three...", "doc-3"),
    ]
    results = await asyncio.gather(*[summarize(text, label) for text, label in docs])
    for r in results:
        print(r)

asyncio.run(main())

Async streaming

import asyncio, anthropic

client = anthropic.AsyncAnthropic()

async def stream_response():
    async with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Write a poem about asyncio."}]
    ) as stream:
        async for text in stream.text_stream:
            print(text, end="", flush=True)
    print()  # newline at end

asyncio.run(stream_response())

FastAPI integration

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import anthropic

app = FastAPI()
client = anthropic.AsyncAnthropic()

@app.post("/summarize")
async def summarize(body: dict):
    message = await client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=512,
        messages=[{"role": "user", "content": f"Summarise: {body['text']}"}]
    )
    return {"summary": message.content[0].text}

@app.post("/stream")
async def stream_chat(body: dict):
    async def generate():
        async with client.messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": body["prompt"]}]
        ) as stream:
            async for text in stream.text_stream:
                yield text
    return StreamingResponse(generate(), media_type="text/plain")

Rate-limit-aware concurrent pool

import asyncio, anthropic

client = anthropic.AsyncAnthropic()

async def bounded_call(sem: asyncio.Semaphore, prompt: str) -> str:
    async with sem:
        msg = await client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=256,
            messages=[{"role": "user", "content": prompt}]
        )
        return msg.content[0].text

async def main(prompts: list[str], concurrency: int = 10):
    sem = asyncio.Semaphore(concurrency)
    return await asyncio.gather(*[bounded_call(sem, p) for p in prompts])

results = asyncio.run(main(["Summarise X", "Classify Y", "Extract Z"], concurrency=5))

For cost modelling on concurrent async workloads, use the Claude Cost Calculator. The Prompt-Pricing Recommender helps choose Haiku vs Sonnet for high-concurrency pipelines.

Frequently asked questions

When should I use AsyncAnthropic vs Anthropic?
Use AsyncAnthropic whenever your code runs inside an async event loop (FastAPI, Starlette, asyncio scripts). Use the sync Anthropic client for scripts, Jupyter notebooks, or frameworks that don't use asyncio (Flask, Django without ASGI). Mixing sync calls inside an async event loop blocks the loop.
How many concurrent Claude API requests can I make with asyncio?
Technically unlimited from the asyncio side, but Anthropic enforces RPM and TPM rate limits per API key. Use asyncio.Semaphore to cap concurrent requests (10–50 is a typical safe range for Tier 1 keys). For even higher throughput, use the Message Batches API instead.
Can I use AsyncAnthropic with Django?
Yes, with Django 4.1+ ASGI mode or inside async views. Standard Django WSGI mode does not run an asyncio event loop — in that case use the sync client or asyncio.run() (but not in a view — it will block the thread). For Django + Claude, FastAPI or Starlette is a simpler choice.
How do I handle errors in asyncio.gather with Claude?
By default, asyncio.gather raises the first exception and cancels remaining tasks. Pass return_exceptions=True to collect all results (successful responses and exceptions) without stopping. Then filter the list: results = [r for r in raw if not isinstance(r, Exception)].

Free tools

Cost Calculator → Prompt-Pricing Recommender → Diff Summarizer → Skills Browser →

Related

Claude Opus 4.7 vs Sonnet 4.6 Pricing (2026 Comparison)How Much Does Claude Cost? (2026 API Pricing Guide)Claude Prompt Caching: 90% Cost Savings Explained (2026)Claude API Cost Calculator: Estimate Your Anthropic BillClaude vs GPT-4 Pricing: 2026 API Cost Comparison