Prompt Caching vs Batch API — Claude Savings

Compare Claude's two main cost-reduction features: prompt caching (90% off cached reads) and the Batch API (50% off everything). When each pays off.

Anthropic offers two distinct cost-reduction levers on top of base API pricing: prompt caching (a 90% discount on cached portions of a prompt) and the Batch API (a 50% discount on the whole request, with a 24-hour SLA). They are not mutually exclusive — but they apply to very different workloads.

How each works

Prompt caching

You mark a prefix of your prompt (system message, document context, tool definitions) as cacheable.
First request: pay cache-write price (~25% more than input).
Subsequent requests within the cache TTL (5 min or 1 hour): pay cached-read price = 10% of normal input.
Net savings appear after 2–3 reuses.

Batch API

Submit a JSONL file of up to 10,000 requests.
Get results within 24 hours (often much faster).
Pay 50% of standard pricing on input and output.
No real-time response — purely async.

When each saves more

Workload	Better lever	Why
Real-time chat with shared system prompt	Caching	Reuse pays off in 3 requests
Offline classification of 100k docs	Batch	50% off whole job, no SLA constraint
RAG with same retrieved chunks	Caching	Cache the chunks once
Eval suite on a model upgrade	Batch	Save 50% on a one-shot offline run
Long-context document Q&A (sync)	Caching	Document context cached across user questions
Nightly data enrichment pipeline	Batch	Async by definition

Can you stack them?

No — Batch API requests do not use cache. Pick the right lever for the workload. If you need sync latency, use caching. If async is acceptable, Batch usually wins because 50% off the entire request beats 90% off only the cached portion when the cached portion is small.

Plug both scenarios into the Claude Cost Calculator to compare your specific numbers.

Frequently asked questions

Can I use prompt caching and the Batch API at the same time?

No. Batch API requests do not benefit from prompt caching. For each workload, pick the lever that fits — caching for sync flows with repeated prefixes, Batch for async workloads with no real-time requirement.

How long does Claude's prompt cache last?

Anthropic offers two TTLs: 5-minute (default, cheaper cache-write price) and 1-hour (~2× cache-write cost but lasts 12× longer). Pick 1-hour for slow-paced chat sessions; 5-minute for high-frequency workloads.

Is the Batch API really 50% off?

Yes. Both input and output tokens are charged at 50% of standard real-time pricing. The only constraint is async delivery — your results land within 24 hours (often within minutes for small batches).

Free tools

Cost Calculator → Prompt-Pricing Recommender → Diff Summarizer → Skills Browser →

Claude Opus 4.7 vs Sonnet 4.6 Pricing (2026 Comparison)How Much Does Claude Cost? (2026 API Pricing Guide)Claude Prompt Caching: 90% Cost Savings Explained (2026)Claude API Cost Calculator: Estimate Your Anthropic Bill Claude vs GPT-4 Pricing: 2026 API Cost Comparison