Claude API Batch vs Streaming: Cost, Latency, and When to Pick Each

Side-by-side comparison of Claude API batch and streaming modes in 2026: per-token cost, latency profile, throughput, and the workloads each one fits.

The Claude API has two delivery modes that look similar on the surface but have completely different economics. Streaming sends tokens to your client in real time over Server-Sent Events at standard list pricing. Batch submits a JSONL file of requests, processes them asynchronously within 24 hours, and discounts the price by 50%. The choice between them is one of the easiest cost wins available, and gets made wrong on roughly half the production teams we look at.

The headline pricing

Model	Streaming input/output	Batch input/output	Savings
Opus 4.7	$15 / $75 per M	$7.50 / $37.50 per M	50%
Sonnet 4.6	$3 / $15 per M	$1.50 / $7.50 per M	50%
Haiku 4.5	$1 / $5 per M	$0.50 / $2.50 per M	50%

Note: streaming has the same per-token cost as non-streaming on-demand requests. Streaming doesn't cost more — it just delivers the same tokens incrementally. The cost decision is not streaming vs non-streaming; it's on-demand (streaming or otherwise) vs batch.

Latency profile

When to pick streaming (or any on-demand mode)

When to pick batch (50% off)

Submitting a batch (Python)

Stacking with caching

Batch and prompt caching stack. A cached prefix on a batch request bills at 5% of standard input price (10% cache rate × 50% batch discount). For high-volume eval suites with a shared system prompt, this is a 95% effective discount on input.

Worked example: 1M-request eval

Suppose you're scoring 1M model outputs, each averaging 2000 input + 200 output tokens, on Sonnet 4.6:

What batch is not for

Frequently asked questions

Is the Claude streaming API more expensive than non-streaming?

No. Streaming has the same per-token price as non-streaming on-demand requests on every Claude model. The price difference is between on-demand and Batch: Batch is 50% cheaper but delivers results asynchronously within 24 hours. Streaming is purely a delivery-mode choice, not a pricing tier.

Can I stream Claude responses in batch mode?

No — batch returns the complete response for each request after the entire job finishes. If you need token-by-token streaming, you must use the on-demand Messages API at standard pricing. Batch is exclusively for non-interactive bulk workloads.

How much does the Claude Batch API save?

A flat 50% off standard list pricing for every model: Opus 4.7 drops from $15/$75 to $7.50/$37.50, Sonnet 4.6 from $3/$15 to $1.50/$7.50, Haiku 4.5 from $1/$5 to $0.50/$2.50 per million tokens. Stacks with prompt caching for a combined ~95% input-token discount on reusable prefixes.

What is the latency of the Claude Batch API?

Anthropic commits to results within 24 hours. Typical real-world completion is 1–4 hours for jobs up to 10,000 requests. There is no SLA for completion under that ceiling, so do not use batch when you need results in minutes.

Free tools

Claude API Batch vs Streaming Cost