"Should I self-host an open-weights model or pay Anthropic per token?" — this is the most common cost-architecture question for teams scaling LLM usage. Here's the honest break-even analysis for 2026.
Pure marginal cost — Claude wins below a threshold
For low-volume workloads (under a few hundred thousand tokens/day), pay-per-token APIs are cheaper than running any GPU. A single A100 or H100 idle for a day costs more than $50 of Claude Sonnet usage at that scale.
The break-even calculation
Self-hosting Llama 3 70B on a cloud H100 (~$3/hour on-demand, ~$1.20/hour reserved): you pay roughly $1,000/month even if utilization is 0%. To match that with Claude Sonnet 4.6 at $3/M input + $15/M output, you'd need to spend ~$1,000 on tokens — that's ~67M input + 33M output tokens per month, or about 1M average-sized chat requests.
Updates. When Llama 4 or 5 ships, you re-deploy. Anthropic updates the Claude API surface for you.
The hybrid pattern that works
Most production teams that "self-host" actually run a hybrid: open-weights models for high-volume bread-and-butter tasks (classification, simple generation), Claude API for hard tasks and as a quality fallback. The cost calculator for the Claude portion lives at the Claude Cost Calculator.
Frequently asked questions
At what volume does self-hosting an open-weights model beat Claude on cost?
Roughly 1M+ chat-sized requests per month is the breakeven against Claude Sonnet 4.6. Below that, GPU idle costs dominate and Claude wins. Above 10M requests per month, self-hosting wins decisively on marginal cost — but factor in engineering overhead.
Is Llama 3 70B as good as Claude Sonnet 4.6?
Not quite. On most published benchmarks (MMLU, HumanEval, GSM8K) Sonnet 4.6 outperforms Llama 3 70B by 5–15 points. On coding tasks specifically the gap is wider. The right comparison is per-task; benchmark on your actual workload before deciding.
What's the cheapest way to start with Claude vs self-hosting?
Start with Claude Haiku 4.5 at $1 input / $5 output per million tokens — by far the lowest barrier to entry. Move to Sonnet when quality demands it, and only consider self-hosting once a single workload exceeds ~$2,000/month of Claude spend with stable usage.