The hidden cost of reasoning models

The first time you swap a chat model for a reasoning model, the answer quality jumps so noticeably it feels like cheating. The second time you look at the bill, it feels less like cheating and more like a tax. This post is about the tax.

What a reasoning model actually charges you for

Reasoning models — OpenAI's o-series, Claude with extended thinking enabled, Gemini Thinking — generate two streams of tokens per request:

Reasoning tokens: a hidden internal monologue where the model works through the problem. You don't see them. The API may or may not surface a summary.
Response tokens: the visible output your user reads.

Both are billed at the regular output rate. The reasoning stream is usually 3–20× larger than the response.

The real-world ratio. Across ~500 production calls we instrumented, the median reasoning-to-response ratio was 8.5×. P95 was 23×. The cheapest-looking models on the list are the ones with the largest hidden multiplier — be careful.

The math, by effort level

Most reasoning models expose a reasoning_effort knob: low, medium, high. Defaults vary. Real numbers from a typical "explain this code change" query:

Effort	Reasoning tokens	Response tokens	Total output	Cost @ $10/1M
`low`	180	220	400	0.4¢
`medium`	1,200	250	1,450	1.45¢
`high`	3,800	280	4,080	4.08¢

At high, you pay 10× more than the response would suggest. Multiplied across a workload of 10,000 calls/day, the difference between low and high is $4,000/month — for the same model on the same question.

How to measure your actual consumption

Every reasoning-capable API returns usage.completion_tokens_details.reasoning_tokens (or equivalent). Log it for every call. The shape of your reasoning-to-response distribution is the single most useful number for capacity planning.

const response = await openai.responses.create({ ... });
const { input_tokens, output_tokens } = response.usage;
const reasoning = response.usage.output_tokens_details?.reasoning_tokens ?? 0;

metrics.histogram('llm.reasoning_ratio', reasoning / (output_tokens - reasoning));
metrics.counter('llm.reasoning_cost_cents', reasoning * 10 / 1e6 * 100);

Within a week you'll have a clear picture of where reasoning is paying off and where it's just spinning. Then route accordingly.

When reasoning models are worth it

Math, formal logic, planning: real, measurable quality jumps. The tax pays for itself.
Multi-step code synthesis: especially anything involving refactoring across files.
Hard extraction from unstructured inputs (legal, medical) where a wrong answer is more expensive than the call.
Anything with an objective verifier: tests pass / they don't, the JSON validates / it doesn't.

When to stay with a regular chat model

Chat, RAG, summarization, drafting: chat models tie or win on cost-per-acceptable-answer.
Creative writing, marketing copy: reasoning models often produce stilted output.
High-volume / low-stakes classification: a tiny mini model wins by 50–100×.
Real-time interactive UX: reasoning latency is 5–30s — your user notices.

The mixed pattern that usually works

The pattern we've shipped a dozen times:

Cheap, fast classifier reads the user input and decides difficulty.
"Easy" → mini chat model.
"Medium" → flagship chat model.
"Hard" → reasoning model at medium effort.
"Critical" (rare) → reasoning at high, with a chat-model second-opinion check.

Routing accuracy of 90%+ is achievable with a small fine-tuned classifier or even a careful zero-shot prompt. The cost savings are typically 60–80% versus reaching for the reasoning model every time.

Reasoning models are extraordinary tools and almost certainly the right default for the highest-stakes 5% of your traffic. They are also the wrong default for the other 95%. Track the ratio, route the rest.

Live reasoning-model prices on the pricing list; sample cost math in the calculator.

What a reasoning model actually charges you for

The math, by effort level

How to measure your actual consumption

When reasoning models are worth it

When to stay with a regular chat model

The mixed pattern that usually works

See reasoning-model prices side by side