The Claude API has two delivery modes that look similar on the surface but have completely different economics. Streaming sends tokens to your client in real time over Server-Sent Events at standard list pricing. Batch submits a JSONL file of requests, processes them asynchronously within 24 hours, and discounts the price by 50%. The choice between them is one of the easiest cost wins available, and gets made wrong on roughly half the production teams we look at.
The headline pricing
Model
Streaming input/output
Batch input/output
Savings
Opus 4.7
$15 / $75 per M
$7.50 / $37.50 per M
50%
Sonnet 4.6
$3 / $15 per M
$1.50 / $7.50 per M
50%
Haiku 4.5
$1 / $5 per M
$0.50 / $2.50 per M
50%
Note: streaming has the same per-token cost as non-streaming on-demand requests. Streaming doesn't cost more — it just delivers the same tokens incrementally. The cost decision is not streaming vs non-streaming; it's on-demand (streaming or otherwise) vs batch.
Latency profile
Streaming: first token typically 300–900ms after request, then tokens flow at 40–100/second depending on model. Total response latency for a 500-token answer: ~1–6 seconds.
Batch: no per-request latency commitment. Anthropic guarantees results within 24 hours. In practice, jobs complete in 1–4 hours for typical sizes.
When to pick streaming (or any on-demand mode)
User-facing chat. Anything where a human is reading the output as it's generated.
Agents. Each step's output is the next step's input — you cannot batch a sequential agent.
Interactive coding. The user is waiting for code to compile or run.
Sub-second SLA paths. Anything with a hard latency requirement.
When to pick batch (50% off)
Evals and offline scoring. Run 10,000 prompts through Claude to grade another model's outputs.
Bulk extraction. Pull structured fields from a document corpus overnight.
Classification backfills. Tag a year of historical content.
Synthetic data generation. Produce training/fine-tuning data offline.
Embeddings precomputation. Although Claude doesn't do embeddings directly, batch is right for any other one-shot bulk task.
Submitting a batch (Python)
import anthropic, json
client = anthropic.Anthropic()
requests = [
{
"custom_id": f"req-{i}",
"params": {
"model": "claude-sonnet-4-6",
"max_tokens": 256,
"messages": [{"role": "user", "content": prompt}],
}
}
for i, prompt in enumerate(prompts)
]
batch = client.messages.batches.create(requests=requests)
print(batch.id, batch.processing_status)
# Poll for completion
while batch.processing_status != "ended":
time.sleep(60)
batch = client.messages.batches.retrieve(batch.id)
# Download results
for result in client.messages.batches.results(batch.id):
print(result.custom_id, result.result.message.content)
Stacking with caching
Batch and prompt caching stack. A cached prefix on a batch request bills at 5% of standard input price (10% cache rate × 50% batch discount). For high-volume eval suites with a shared system prompt, this is a 95% effective discount on input.
Worked example: 1M-request eval
Suppose you're scoring 1M model outputs, each averaging 2000 input + 200 output tokens, on Sonnet 4.6:
Is the Claude streaming API more expensive than non-streaming?
No. Streaming has the same per-token price as non-streaming on-demand requests on every Claude model. The price difference is between on-demand and Batch: Batch is 50% cheaper but delivers results asynchronously within 24 hours. Streaming is purely a delivery-mode choice, not a pricing tier.
Can I stream Claude responses in batch mode?
No — batch returns the complete response for each request after the entire job finishes. If you need token-by-token streaming, you must use the on-demand Messages API at standard pricing. Batch is exclusively for non-interactive bulk workloads.
How much does the Claude Batch API save?
A flat 50% off standard list pricing for every model: Opus 4.7 drops from $15/$75 to $7.50/$37.50, Sonnet 4.6 from $3/$15 to $1.50/$7.50, Haiku 4.5 from $1/$5 to $0.50/$2.50 per million tokens. Stacks with prompt caching for a combined ~95% input-token discount on reusable prefixes.
What is the latency of the Claude Batch API?
Anthropic commits to results within 24 hours. Typical real-world completion is 1–4 hours for jobs up to 10,000 requests. There is no SLA for completion under that ceiling, so do not use batch when you need results in minutes.