Claude Prompt Caching, Explained

How Claude's prompt caching works in 2026: what gets cached, the 5-minute TTL, cost math, and when caching breaks even.

Prompt caching is the single largest cost lever in the Claude API. Cached reads cost 10% of input pricing — a 90% discount that compounds across reuses.

What you can cache

System prompt (most common — long instructions, examples, persona)
Tool definitions (large tool catalogs)
Document context (RAG-retrieved chunks, code repositories)
Conversation history above a marked breakpoint

The 5-minute rule

By default, a cached prefix lives for 5 minutes after the most recent read. Each read refreshes the TTL. If you have steady traffic above one request every 5 minutes per prefix, the cache stays hot.

1-hour cache TTL is available at higher write cost — useful for low-traffic but expensive prefixes.

The break-even point

Cache write costs ~25% more than a fresh input token. Cache read costs 10%. So caching breaks even at ~2 reuses within the TTL window. After that, every additional reuse is pure savings.

Reuses within TTL	Effective cost vs no-cache
1	+125% (worse)
2	~67% (break-even crossed)
5	~35%
20	~16%
100	~11%

How to actually use it

In the messages API, add cache_control: { "type": "ephemeral" } to a content block. Everything before that breakpoint in the prompt gets cached. You can place up to 4 breakpoints.

Estimate your savings

The Claude Cost Calculator has a cache-utilization slider — set the fraction of input tokens that are cached and see the monthly impact.

Frequently asked questions

How long does Claude's prompt cache last?

The default TTL is 5 minutes, refreshed on each read. A 1-hour TTL is also available at a higher cache-write cost — useful when traffic is lower than one request per 5 minutes but the prefix is large and expensive.

How many cache breakpoints can I use?

Up to 4 cache breakpoints per API call. Each breakpoint marks the end of a cacheable prefix. Most teams only need 1–2: one for the system prompt and optionally one for a document context block.

Does caching work with the Batch API?

Yes — prompt caching and the Batch API stack. A cached prefix on a batch request still gets the 10% cache-read rate, and the batch 50% discount applies on top. Effective rate on cached prefixes in batch: 5% of standard input pricing.

Free tools

Cost Calculator → Prompt-Pricing Recommender → Diff Summarizer → Skills Browser →

Claude Opus 4.7 vs Sonnet 4.6 Pricing (2026 Comparison)How Much Does Claude Cost? (2026 API Pricing Guide)Claude API Cost Calculator: Estimate Your Anthropic Bill Claude vs GPT-4 Pricing: 2026 API Cost Comparison Anthropic API Pricing 2026: Full Cost Reference