All three Claude tiers in 2026 — Opus 4.7, Sonnet 4.6, and Haiku 4.5 — share a 200,000-token context window. This is one of the largest offered by any commercial LLM API.
What fits in 200K tokens?
Content type
Approximate token count
Fits in 200K?
Average English novel (80K words)
~107K tokens
Yes (1.9×)
1,000-page PDF (250K words)
~333K tokens
No — split into chunks
10K-line codebase
~50–80K tokens
Yes (2.5–4×)
1-hour meeting transcript
~10–15K tokens
Yes (13–20×)
200-page technical doc
~60K tokens
Yes (3×)
Rule of thumb: 1 English word ≈ 1.3 tokens. 1 line of code ≈ 5–8 tokens.
Context window and pricing
Every token in the context window is a billable input token — including conversation history and documents. A full 200K-token context on Sonnet 4.6 costs $0.60 per request (200K × $3/M). On Opus 4.7, the same window costs $3.00 per request.
Use prompt caching for repeated large contexts: cache the static document portion once, pay 90% less on all subsequent reads. See the prompt caching guide for break-even math.
Strategies for large-context workloads
Cache the static part — prefix documents with cache_control: ephemeral. Only new user messages and model responses are billed at full rate.
Route by window size — use Haiku for short contexts (under 20K tokens), Sonnet for medium (20–100K), Opus only when maximum reasoning quality is required regardless of length.
Truncate conversation history — RAG and agentic workloads accumulate history fast. Keep only the last N turns of chat plus the cached system context.
Use the Batch API for bulk long-context jobs — 50% off standard pricing with a 24-hour SLA. Offline summarisation and extraction workloads are ideal. See the Batch API guide.
Estimate your long-context costs
Plug your expected context size and request volume into the Claude Cost Calculator to model monthly spend with and without caching enabled.
Frequently asked questions
Is Claude's 200K context window the same across all models?
Yes. Opus 4.7, Sonnet 4.6, and Haiku 4.5 all support 200K input tokens in 2026. The difference is quality and price, not context length.
Does a longer context make Claude slower?
Yes — time-to-first-token scales roughly linearly with input length. For latency-sensitive applications, keep context concise or use streaming so the user sees partial output immediately.
What happens if I exceed 200K tokens?
The API returns a 400 error (context_length_exceeded). You must truncate or summarise earlier turns. There is no automatic truncation.
How many pages of a PDF can Claude read at once?
Roughly 150–200 pages of dense English text fits in 200K tokens (assuming ~1,000–1,300 tokens per page). For larger documents, split into chunks with overlap, or use a retrieval layer to pass only the relevant sections.