Sonnet 4.6 and Haiku 4.5 cover the bottom 80% of production Claude workloads. The price gap is 3× on input and 3× on output — meaningful at scale but small in absolute terms for low-volume apps. Here is how to choose.
Pricing side-by-side (per million tokens)
Model
Input
Output
Cached read
Cache write (5m)
Sonnet 4.6
$3
$15
$0.30
$3.75
Haiku 4.5
$1
$5
$0.10
$1.25
Use Haiku 4.5 when
The task is templated. Classification, extraction with a tight JSON schema, language detection, simple summarization.
Latency matters more than depth. Haiku is markedly faster end-to-end — useful in autocomplete, chat suggestions, real-time UIs.
Volume is high and value-per-request is low. 10M requests/month at a few-hundred-tokens-each will save thousands on Haiku over Sonnet.
Use Sonnet 4.6 when
Reasoning over multi-step inputs. Tool-calling chains, code editing, agent flows.
Long context. Sonnet retrieves and chains 50–200k token contexts more reliably than Haiku.
Unfamiliar domain. Anything where the prompt assumes specialized knowledge — legal, medical, niche engineering — favors Sonnet's deeper world model.
Concrete routing pattern
A common pattern in production: Haiku for the first pass (classify intent, extract structured fields), then escalate to Sonnet only when Haiku's confidence is low or the task type is "open-ended generation." This cuts spend 40–70% on mixed workloads without measurable quality regressions.
Estimate your savings
Plug your workload mix into the Claude Cost Calculator — toggle between Sonnet and Haiku to see the monthly USD delta. For per-prompt routing decisions, the Prompt-Pricing Recommender takes a prompt and recommends the cheapest tier likely to handle it.
Frequently asked questions
Is Haiku 4.5 always cheaper than Sonnet 4.6?
Yes — Haiku is 3× cheaper on both input ($1 vs $3 per million tokens) and output ($5 vs $15 per million). The savings compound on cached reads too ($0.10 vs $0.30 per million).
Can Haiku 4.5 handle tool use?
Yes. Haiku supports function calling and tool use with the same API surface as Sonnet and Opus. For simple single-tool flows Haiku is often sufficient; for multi-tool chains where one mistake compounds, Sonnet is safer.
When should I escalate from Haiku to Sonnet automatically?
Common triggers: Haiku returns malformed JSON when the schema is strict, classifier confidence drops below 0.6, or the output is shorter than expected. Implement this as a retry-with-Sonnet wrapper rather than a global switch — most traffic still routes to Haiku.