AI Fees

000

Models

00

Providers

$0.00

Cheapest input

$0.00

Average input

Cost calculator

Input tokens

Output tokens

Top 5 cheapest for 1M in + 1M out

All models & prices

Per 1M tokens. Click a model to pin for compare; click a price to copy.

	Provider	Model	Input ($/1M)	Output ($/1M)	Context	Total*

7 ways to cut your LLM bill

Battle-tested cost-saving tactics. Full guide in the blog.

1

Right-size to mini/nano variants

For classification, routing, summarization and most CRUD-style tasks, the mini and nano tiers deliver 90% of the quality at 5–15% of the cost. Default to the smallest model that passes your eval.

Saves up to 85%

2

Use prompt caching for repeated prefixes

If your system prompt or RAG context repeats across calls, cache it. Anthropic, OpenAI and Google all offer cached-input pricing at 10–25% of the regular rate.

Saves 75–90% on input

3

Cap output tokens explicitly

Always set max_tokens. A chatbot that should answer in 200 tokens but is allowed 8K will occasionally produce a 4K essay — at output prices that are 2–10x your input rate.

Stops runaway bills

4

Use the Batch API for async work

OpenAI, Anthropic and Google offer batch tiers at ~50% off for jobs that complete within 24 hours. Perfect for enrichment, classification, embeddings and overnight ETL.

Saves 50%

5

RAG over context-stuffing

Don't dump entire wikis into the prompt. Retrieve the relevant 2-5K tokens per query. A 1M-token prompt at $1/1M input is $1 per call — that adds up fast.

Saves 60–95% per call

6

Route by task complexity

Build a router: cheap model handles simple cases, expensive model handles hard ones. A classifier costing 0.1¢ per call can offload 80% of traffic away from premium tiers.

Saves 40–70%

7

Watch reasoning token consumption

Reasoning models (o-series, Claude thinking, Gemini Thinking) generate hidden tokens billed as output. A 200-token answer might cost 3,000 tokens of reasoning. Use them only when you need the extra quality.

Avoids 10x surprises

From the blog

See all posts →

Cost optimization·May 15, 2026

How to cut your LLM bill by 80% in 2026

A practical playbook: model right-sizing, prompt caching, batch tiers, output capping and the routing pattern that actually moves the needle.

Read the guide →

Compare·May 10, 2026

GPT-5 vs Claude Sonnet 4.6 vs Gemini 2.5

Three production tasks, three frontier models, one cost spreadsheet. Where each one wins and where you're overpaying.

Read the analysis →

Reasoning·May 5, 2026

The hidden cost of reasoning models

A 200-token answer can hide 3,000 tokens of thinking. Here's how to measure it, when it's worth it, and when to stay with a regular chat model.

Read the breakdown →

Batch API·Apr 28, 2026

When the Batch API discount actually pays off

50% off sounds great until you realize you waited 18 hours. The decision math behind sync vs batch for real workloads.

Read the math →

Budget tier·Apr 20, 2026

DeepSeek and Qwen: production-ready?

DeepSeek V3 and Alibaba Qwen offer eye-watering price-to-performance. Latency, reliability, compliance — what to weigh before flipping the switch.

Read the verdict →

How AI pricing works

Tokens, not words

LLMs bill by token. A token is roughly 4 characters or 0.75 words of English. "Hello, world!" is about 4 tokens. Prices on this page are per 1 million tokens — the industry-standard unit.

Input vs output

Providers charge separate rates for input tokens (your prompt) and output tokens (the response). Output is usually 2–10× more expensive than input. Long answers cost more — keep outputs tight.

Context window

The context window is the maximum tokens a single request can hold. Bigger context lets you pass more data per call but does not change the per-token rate. 1M-token context fits roughly an entire novel.

Tiered pricing

Some models drop the per-token price after volume thresholds. Rows marked [tiered] use volume-based pricing — your effective rate depends on how many tokens you actually send.

Reasoning tokens

Reasoning models (o-series, Claude with thinking, Gemini Thinking) generate hidden "thinking" tokens before responding. These are billed as output. Often better answers but higher effective cost.

Source quality

Each row shows a source mark: ● official (from the provider directly), ◐ secondary aggregator, ○ community-reported. We prefer official sources and re-verify every 6 hours.

Browse by provider

Click a tile to filter the pricing list above.

Frequently asked

How is LLM pricing calculated?

LLM providers charge based on tokens — small chunks of text roughly equal to 4 characters or 0.75 words. Pricing is typically split into input tokens (your prompt) and output tokens (the model's response).

Output tokens cost 2–10× more than input. AI Fees lists prices per 1 million tokens, the standard industry unit.

What is the cheapest LLM API right now?

Cheapest model and provider change weekly. Our live dashboard updates every 6 hours and shows the current cheapest input cost in the stats strip at the top.

Models from DeepSeek, Alibaba and Google tend to dominate the budget tier — but always verify the latest by sorting the table by "Input" or "Total".

Are these prices official?

Yes — our automated pipeline pulls from official provider pricing pages every 6 hours via GitHub Actions. Each model row shows a source indicator: ● official, ◐ secondary aggregator, ○ community-reported.

Always verify directly with the provider before contracting.

What does context window mean?

The context window is the maximum number of tokens a model can read in a single request. A 200K context can fit roughly a 150,000-word document.

Larger context does not change the per-token price, but it lets you pass more data per call — useful for RAG and long-document use cases.

How often is the data refreshed?

Pricing is refreshed every 6 hours via an automated GitHub Actions workflow. The live badge at the top shows the timestamp of the last successful refresh.

If a live fetch fails, the page falls back to a recent snapshot so you always see data.

Can I share my comparison?

Yes. Your active filter, search term, calculator inputs and pinned models are saved in the URL. Copy the address bar to share a deep link, or use the share button in the top bar to copy it to your clipboard.

Is AI Fees free to use?

Yes — the entire tool is free, no signup, no ads on this page. We don't collect personal data. AI Fees is operated as a public good by Philipp Stegmann and friends at deine-ai.com.

Glossary

Token

~4 characters or 0.75 English words. The atomic unit LLMs read and bill on.

Input token

A token in your prompt. Usually cheaper than output. Optimize by trimming system prompts and using shorter context.

Output token

A token the model generates. Usually 2–10× the input rate. Reasoning tokens count as output.

Context window

Maximum tokens per single request. A 200K window fits ~150K words. Affects what you can do per call, not the per-token rate.

Reasoning model

Model that thinks before responding (o-series, Claude with thinking, Gemini Thinking). Better answers, but hidden tokens count as output.

Tiered pricing

Per-token price changes at certain volume thresholds. Shown with the [tiered] tag.

Cached input

Steep discounts on repeated prompt prefixes from Anthropic, OpenAI, Google. Not yet shown on this page.

Batch API

Async pricing tier (often -50%) with longer turnaround. Good for non-realtime jobs.

At a glance

Cost calculator

Top 5 cheapest for 1M in + 1M out

All models & prices

7 ways to cut your LLM bill

Right-size to mini/nano variants

Use prompt caching for repeated prefixes

Cap output tokens explicitly

Use the Batch API for async work

RAG over context-stuffing

Route by task complexity

Watch reasoning token consumption

From the blog

How AI pricing works

Tokens, not words

Input vs output

Context window

Tiered pricing

Reasoning tokens

Source quality

Browse by provider

Frequently asked

Glossary