AI Fees
Compare token prices across every major LLM provider — find where your spend goes furthest.
At a glance
Cost calculator
Top 5 cheapest for 1M in + 1M out
All models & prices
Per 1M tokens. Click a model to pin for compare; click a price to copy.
| Provider | Model | Input ($/1M) | Output ($/1M) | Context | Total* |
|---|
7 ways to cut your LLM bill
Battle-tested cost-saving tactics. Full guide in the blog.
Right-size to mini/nano variants
For classification, routing, summarization and most CRUD-style tasks, the mini and nano tiers deliver 90% of the quality at 5–15% of the cost. Default to the smallest model that passes your eval.
Use prompt caching for repeated prefixes
If your system prompt or RAG context repeats across calls, cache it. Anthropic, OpenAI and Google all offer cached-input pricing at 10–25% of the regular rate.
Saves 75–90% on inputCap output tokens explicitly
Always set max_tokens. A chatbot that should answer in 200 tokens but is allowed 8K will occasionally produce a 4K essay — at output prices that are 2–10x your input rate.
Use the Batch API for async work
OpenAI, Anthropic and Google offer batch tiers at ~50% off for jobs that complete within 24 hours. Perfect for enrichment, classification, embeddings and overnight ETL.
Saves 50%RAG over context-stuffing
Don't dump entire wikis into the prompt. Retrieve the relevant 2-5K tokens per query. A 1M-token prompt at $1/1M input is $1 per call — that adds up fast.
Saves 60–95% per callRoute by task complexity
Build a router: cheap model handles simple cases, expensive model handles hard ones. A classifier costing 0.1¢ per call can offload 80% of traffic away from premium tiers.
Saves 40–70%Watch reasoning token consumption
Reasoning models (o-series, Claude thinking, Gemini Thinking) generate hidden tokens billed as output. A 200-token answer might cost 3,000 tokens of reasoning. Use them only when you need the extra quality.
Avoids 10x surprisesFrom the blog
A practical playbook: model right-sizing, prompt caching, batch tiers, output capping and the routing pattern that actually moves the needle.
Read the guide →Three production tasks, three frontier models, one cost spreadsheet. Where each one wins and where you're overpaying.
Read the analysis →A 200-token answer can hide 3,000 tokens of thinking. Here's how to measure it, when it's worth it, and when to stay with a regular chat model.
Read the breakdown →50% off sounds great until you realize you waited 18 hours. The decision math behind sync vs batch for real workloads.
Read the math →DeepSeek V3 and Alibaba Qwen offer eye-watering price-to-performance. Latency, reliability, compliance — what to weigh before flipping the switch.
Read the verdict →How AI pricing works
Tokens, not words
LLMs bill by token. A token is roughly 4 characters or 0.75 words of English. "Hello, world!" is about 4 tokens. Prices on this page are per 1 million tokens — the industry-standard unit.
Input vs output
Providers charge separate rates for input tokens (your prompt) and output tokens (the response). Output is usually 2–10× more expensive than input. Long answers cost more — keep outputs tight.
Context window
The context window is the maximum tokens a single request can hold. Bigger context lets you pass more data per call but does not change the per-token rate. 1M-token context fits roughly an entire novel.
Tiered pricing
Some models drop the per-token price after volume thresholds. Rows marked [tiered] use volume-based pricing — your effective rate depends on how many tokens you actually send.
Reasoning tokens
Reasoning models (o-series, Claude with thinking, Gemini Thinking) generate hidden "thinking" tokens before responding. These are billed as output. Often better answers but higher effective cost.
Source quality
Each row shows a source mark: ● official (from the provider directly), ◐ secondary aggregator, ○ community-reported. We prefer official sources and re-verify every 6 hours.
Browse by provider
Click a tile to filter the pricing list above.
Frequently asked
How is LLM pricing calculated?
LLM providers charge based on tokens — small chunks of text roughly equal to 4 characters or 0.75 words. Pricing is typically split into input tokens (your prompt) and output tokens (the model's response).
Output tokens cost 2–10× more than input. AI Fees lists prices per 1 million tokens, the standard industry unit.
What is the cheapest LLM API right now?
Cheapest model and provider change weekly. Our live dashboard updates every 6 hours and shows the current cheapest input cost in the stats strip at the top.
Models from DeepSeek, Alibaba and Google tend to dominate the budget tier — but always verify the latest by sorting the table by "Input" or "Total".
Are these prices official?
Yes — our automated pipeline pulls from official provider pricing pages every 6 hours via GitHub Actions. Each model row shows a source indicator: ● official, ◐ secondary aggregator, ○ community-reported.
Always verify directly with the provider before contracting.
What does context window mean?
The context window is the maximum number of tokens a model can read in a single request. A 200K context can fit roughly a 150,000-word document.
Larger context does not change the per-token price, but it lets you pass more data per call — useful for RAG and long-document use cases.
How often is the data refreshed?
Pricing is refreshed every 6 hours via an automated GitHub Actions workflow. The live badge at the top shows the timestamp of the last successful refresh.
If a live fetch fails, the page falls back to a recent snapshot so you always see data.
Can I share my comparison?
Yes. Your active filter, search term, calculator inputs and pinned models are saved in the URL. Copy the address bar to share a deep link, or use the share button in the top bar to copy it to your clipboard.
Is AI Fees free to use?
Yes — the entire tool is free, no signup, no ads on this page. We don't collect personal data. AI Fees is operated as a public good by Philipp Stegmann and friends at deine-ai.com.