Claude API Batch vs Streaming Cost

Side-by-side comparison of Claude API batch and streaming modes in 2026: per-token cost, latency profile, throughput, and the workloads each one fits.

🔥 Launch tonight — Power Prompts PDF 50p (just 50p tonight)30 battle-tested Claude Code prompts · 8 pages · paste into CLAUDE.md · price reverts to £5

The Claude API has two delivery modes that look similar on the surface but have completely different economics. Streaming sends tokens to your client in real time over Server-Sent Events at standard list pricing. Batch submits a JSONL file of requests, processes them asynchronously within 24 hours, and discounts the price by 50%. The choice between them is one of the easiest cost wins available, and gets made wrong on roughly half the production teams we look at.

The headline pricing

ModelStreaming input/outputBatch input/outputSavings
Opus 4.7$15 / $75 per M$7.50 / $37.50 per M50%
Sonnet 4.6$3 / $15 per M$1.50 / $7.50 per M50%
Haiku 4.5$1 / $5 per M$0.50 / $2.50 per M50%

Note: streaming has the same per-token cost as non-streaming on-demand requests. Streaming doesn't cost more — it just delivers the same tokens incrementally. The cost decision is not streaming vs non-streaming; it's on-demand (streaming or otherwise) vs batch.

Latency profile

When to pick streaming (or any on-demand mode)

When to pick batch (50% off)

Submitting a batch (Python)

import anthropic, json

client = anthropic.Anthropic()

requests = [
    {
        "custom_id": f"req-{i}",
        "params": {
            "model": "claude-sonnet-4-6",
            "max_tokens": 256,
            "messages": [{"role": "user", "content": prompt}],
        }
    }
    for i, prompt in enumerate(prompts)
]

batch = client.messages.batches.create(requests=requests)
print(batch.id, batch.processing_status)

# Poll for completion
while batch.processing_status != "ended":
    time.sleep(60)
    batch = client.messages.batches.retrieve(batch.id)

# Download results
for result in client.messages.batches.results(batch.id):
    print(result.custom_id, result.result.message.content)

Stacking with caching

Batch and prompt caching stack. A cached prefix on a batch request bills at 5% of standard input price (10% cache rate × 50% batch discount). For high-volume eval suites with a shared system prompt, this is a 95% effective discount on input.

Worked example: 1M-request eval

Suppose you're scoring 1M model outputs, each averaging 2000 input + 200 output tokens, on Sonnet 4.6:

The 50% headline saving is real; layering caching nearly doubles it again.

What batch is not for

Cross-reference the deeper batch API guide and the streaming vs non-streaming page.

Frequently asked questions

Is the Claude streaming API more expensive than non-streaming?
No. Streaming has the same per-token price as non-streaming on-demand requests on every Claude model. The price difference is between on-demand and Batch: Batch is 50% cheaper but delivers results asynchronously within 24 hours. Streaming is purely a delivery-mode choice, not a pricing tier.
Can I stream Claude responses in batch mode?
No — batch returns the complete response for each request after the entire job finishes. If you need token-by-token streaming, you must use the on-demand Messages API at standard pricing. Batch is exclusively for non-interactive bulk workloads.
How much does the Claude Batch API save?
A flat 50% off standard list pricing for every model: Opus 4.7 drops from $15/$75 to $7.50/$37.50, Sonnet 4.6 from $3/$15 to $1.50/$7.50, Haiku 4.5 from $1/$5 to $0.50/$2.50 per million tokens. Stacks with prompt caching for a combined ~95% input-token discount on reusable prefixes.
What is the latency of the Claude Batch API?
Anthropic commits to results within 24 hours. Typical real-world completion is 1–4 hours for jobs up to 10,000 requests. There is no SLA for completion under that ceiling, so do not use batch when you need results in minutes.

Free tools

Cost Calculator → Prompt-Pricing Recommender → Diff Summarizer → Skills Browser →

Related

Claude Opus 4.7 vs Sonnet 4.6 Pricing (2026 Comparison)How Much Does Claude Cost? (2026 API Pricing Guide)Claude Prompt Caching: 90% Cost Savings Explained (2026)Claude API Cost Calculator: Estimate Your Anthropic BillClaude vs GPT-4 Pricing: 2026 API Cost Comparison