AI Fees

Compare token prices across every major LLM provider — find where your spend goes furthest.

Loading latest prices…
News flash loading…

At a glance

000
Models
00
Providers
$0.00
Cheapest input
$0.00
Average input

Cost calculator

Top 5 cheapest for 1M in + 1M out

All models & prices

Per 1M tokens. Click a model to pin for compare; click a price to copy.

Provider Model Input ($/1M) Output ($/1M) Context Total*

7 ways to cut your LLM bill

Battle-tested cost-saving tactics. Full guide in the blog.

1

Right-size to mini/nano variants

For classification, routing, summarization and most CRUD-style tasks, the mini and nano tiers deliver 90% of the quality at 5–15% of the cost. Default to the smallest model that passes your eval.

Saves up to 85%
2

Use prompt caching for repeated prefixes

If your system prompt or RAG context repeats across calls, cache it. Anthropic, OpenAI and Google all offer cached-input pricing at 10–25% of the regular rate.

Saves 75–90% on input
3

Cap output tokens explicitly

Always set max_tokens. A chatbot that should answer in 200 tokens but is allowed 8K will occasionally produce a 4K essay — at output prices that are 2–10x your input rate.

Stops runaway bills
4

Use the Batch API for async work

OpenAI, Anthropic and Google offer batch tiers at ~50% off for jobs that complete within 24 hours. Perfect for enrichment, classification, embeddings and overnight ETL.

Saves 50%
5

RAG over context-stuffing

Don't dump entire wikis into the prompt. Retrieve the relevant 2-5K tokens per query. A 1M-token prompt at $1/1M input is $1 per call — that adds up fast.

Saves 60–95% per call
6

Route by task complexity

Build a router: cheap model handles simple cases, expensive model handles hard ones. A classifier costing 0.1¢ per call can offload 80% of traffic away from premium tiers.

Saves 40–70%
7

Watch reasoning token consumption

Reasoning models (o-series, Claude thinking, Gemini Thinking) generate hidden tokens billed as output. A 200-token answer might cost 3,000 tokens of reasoning. Use them only when you need the extra quality.

Avoids 10x surprises

From the blog

See all posts →

How AI pricing works

Tokens, not words

LLMs bill by token. A token is roughly 4 characters or 0.75 words of English. "Hello, world!" is about 4 tokens. Prices on this page are per 1 million tokens — the industry-standard unit.

Input vs output

Providers charge separate rates for input tokens (your prompt) and output tokens (the response). Output is usually 2–10× more expensive than input. Long answers cost more — keep outputs tight.

Context window

The context window is the maximum tokens a single request can hold. Bigger context lets you pass more data per call but does not change the per-token rate. 1M-token context fits roughly an entire novel.

Tiered pricing

Some models drop the per-token price after volume thresholds. Rows marked [tiered] use volume-based pricing — your effective rate depends on how many tokens you actually send.

Reasoning tokens

Reasoning models (o-series, Claude with thinking, Gemini Thinking) generate hidden "thinking" tokens before responding. These are billed as output. Often better answers but higher effective cost.

Source quality

Each row shows a source mark: ● official (from the provider directly), ◐ secondary aggregator, ○ community-reported. We prefer official sources and re-verify every 6 hours.

Browse by provider

Click a tile to filter the pricing list above.

Frequently asked

How is LLM pricing calculated?

LLM providers charge based on tokens — small chunks of text roughly equal to 4 characters or 0.75 words. Pricing is typically split into input tokens (your prompt) and output tokens (the model's response).

Output tokens cost 2–10× more than input. AI Fees lists prices per 1 million tokens, the standard industry unit.

What is the cheapest LLM API right now?

Cheapest model and provider change weekly. Our live dashboard updates every 6 hours and shows the current cheapest input cost in the stats strip at the top.

Models from DeepSeek, Alibaba and Google tend to dominate the budget tier — but always verify the latest by sorting the table by "Input" or "Total".

Are these prices official?

Yes — our automated pipeline pulls from official provider pricing pages every 6 hours via GitHub Actions. Each model row shows a source indicator: ● official, ◐ secondary aggregator, ○ community-reported.

Always verify directly with the provider before contracting.

What does context window mean?

The context window is the maximum number of tokens a model can read in a single request. A 200K context can fit roughly a 150,000-word document.

Larger context does not change the per-token price, but it lets you pass more data per call — useful for RAG and long-document use cases.

How often is the data refreshed?

Pricing is refreshed every 6 hours via an automated GitHub Actions workflow. The live badge at the top shows the timestamp of the last successful refresh.

If a live fetch fails, the page falls back to a recent snapshot so you always see data.

Can I share my comparison?

Yes. Your active filter, search term, calculator inputs and pinned models are saved in the URL. Copy the address bar to share a deep link, or use the share button in the top bar to copy it to your clipboard.

Is AI Fees free to use?

Yes — the entire tool is free, no signup, no ads on this page. We don't collect personal data. AI Fees is operated as a public good by Philipp Stegmann and friends at deine-ai.com.

Glossary

Token
~4 characters or 0.75 English words. The atomic unit LLMs read and bill on.
Input token
A token in your prompt. Usually cheaper than output. Optimize by trimming system prompts and using shorter context.
Output token
A token the model generates. Usually 2–10× the input rate. Reasoning tokens count as output.
Context window
Maximum tokens per single request. A 200K window fits ~150K words. Affects what you can do per call, not the per-token rate.
Reasoning model
Model that thinks before responding (o-series, Claude with thinking, Gemini Thinking). Better answers, but hidden tokens count as output.
Tiered pricing
Per-token price changes at certain volume thresholds. Shown with the [tiered] tag.
Cached input
Steep discounts on repeated prompt prefixes from Anthropic, OpenAI, Google. Not yet shown on this page.
Batch API
Async pricing tier (often -50%) with longer turnaround. Good for non-realtime jobs.