The five-times price gap between Opus and Sonnet makes this the most consequential routing decision in any Claude deployment. Here is the heuristic.
Default to Sonnet
Sonnet 4.6 handles the vast majority of production workloads at production quality. Unless you have a specific reason to use Opus, you do not have a reason to use Opus.
Use Opus when…
Multi-step agentic flows. Compounding error rates matter. An agent that takes 10 actions at 95% per-step accuracy succeeds 60% of the time; at 99%, 90%. Opus's edge per-step compounds.
Code generation in large, unfamiliar repos. Sonnet hallucinates symbol names more often.
Long-context (>100k tokens) retrieval with multi-hop reasoning. Opus retrieves and chains better.
High-stakes decisions. Legal review, medical triage, financial analysis. The cost of one bad answer dwarfs the price gap.
Hybrid: route at runtime
The most cost-effective production pattern is to default to Sonnet and escalate to Opus when a confidence check fails. The Prompt-Pricing Recommender shows how to classify prompts ahead of time so you skip the Sonnet call entirely on the obvious-Opus prompts.
Is it ever worth using only Opus for all requests?
Rarely at scale. All-Opus is 5× the cost of all-Sonnet for the same request volume. The typical break-even is when your task failure rate on Sonnet exceeds ~20% — rare for well-structured prompts. Most teams cap Opus at 10–30% of traffic.
How do I know if my Sonnet results are good enough?
Build an eval set — 50–200 labeled examples — and measure Sonnet's pass rate against your acceptance criterion. If Sonnet passes >80% and Opus passes >95%, the gap may or may not justify 5× cost depending on the stakes of the failures.
Can I run Opus and Sonnet in parallel and pick the better answer?
Yes, some teams do speculative routing — send to both, take the cheaper model's answer if it passes a confidence or self-check, else fall back to the expensive one. This adds latency to the cheap path but can reduce median cost by 60–70%.