Prompt Caching vs Batch API — Claude Savings

Compare Claude's two main cost-reduction features: prompt caching (90% off cached reads) and the Batch API (50% off everything). When each pays off.

🔥 Launch tonight — Power Prompts PDF 50p (just 50p tonight)30 battle-tested Claude Code prompts · 8 pages · paste into CLAUDE.md · price reverts to £5

Anthropic offers two distinct cost-reduction levers on top of base API pricing: prompt caching (a 90% discount on cached portions of a prompt) and the Batch API (a 50% discount on the whole request, with a 24-hour SLA). They are not mutually exclusive — but they apply to very different workloads.

How each works

Prompt caching

Batch API

When each saves more

WorkloadBetter leverWhy
Real-time chat with shared system promptCachingReuse pays off in 3 requests
Offline classification of 100k docsBatch50% off whole job, no SLA constraint
RAG with same retrieved chunksCachingCache the chunks once
Eval suite on a model upgradeBatchSave 50% on a one-shot offline run
Long-context document Q&A (sync)CachingDocument context cached across user questions
Nightly data enrichment pipelineBatchAsync by definition

Can you stack them?

No — Batch API requests do not use cache. Pick the right lever for the workload. If you need sync latency, use caching. If async is acceptable, Batch usually wins because 50% off the entire request beats 90% off only the cached portion when the cached portion is small.

Plug both scenarios into the Claude Cost Calculator to compare your specific numbers.

Frequently asked questions

Can I use prompt caching and the Batch API at the same time?
No. Batch API requests do not benefit from prompt caching. For each workload, pick the lever that fits — caching for sync flows with repeated prefixes, Batch for async workloads with no real-time requirement.
How long does Claude's prompt cache last?
Anthropic offers two TTLs: 5-minute (default, cheaper cache-write price) and 1-hour (~2× cache-write cost but lasts 12× longer). Pick 1-hour for slow-paced chat sessions; 5-minute for high-frequency workloads.
Is the Batch API really 50% off?
Yes. Both input and output tokens are charged at 50% of standard real-time pricing. The only constraint is async delivery — your results land within 24 hours (often within minutes for small batches).

Free tools

Cost Calculator → Prompt-Pricing Recommender → Diff Summarizer → Skills Browser →

Related

Claude Opus 4.7 vs Sonnet 4.6 Pricing (2026 Comparison)How Much Does Claude Cost? (2026 API Pricing Guide)Claude Prompt Caching: 90% Cost Savings Explained (2026)Claude API Cost Calculator: Estimate Your Anthropic BillClaude vs GPT-4 Pricing: 2026 API Cost Comparison