Which AI API is cheapest for high-volume workloads?

The cheapest AI API depends on input/output mix, but low-cost models from Gemini, DeepSeek, Groq, Cohere, and hosted Llama providers often rank well for high-volume classification, routing, extraction, and summarization.

When is API pricing cheaper than an AI subscription?

API pricing is often cheaper for light or automated workloads with predictable token use. A flat subscription can be simpler for heavy interactive use, bundled product features, or unlimited-style consumer plans.

What workloads does the calculator compare?

The calculator groups models by practical workloads: lowest monthly cost, coding agents, research and long documents, RAG or support bots, realtime apps, and frontier-quality reasoning.

Data reviewed 2026-07-17

AI API Pricing Calculator 2026

Estimate monthly API spend by workload: coding agents, research, RAG support bots, realtime apps, and low-cost automation across OpenAI, Anthropic, Google, xAI, DeepSeek, Mistral, and more.

← Subscription plans API vs subscription optimizer →

Cheapest overall

DeepSeek V4 Flash

$0.14 / $0.28 per MTok

Best value general

Gemini 2.5 Flash

$0.30 / $2.50 per MTok

Best for coding

Mistral Codestral

$0.30 / $0.90 per MTok

Most powerful

Claude Opus 4.8

$5.00 / $25.00 per MTok

📋 Not sure about token counts? Pick one or more use cases:

🌍 Your country:

No Sales Tax (varies)

ℹ️ No federal VAT. State sales tax on digital services varies (0–10%). Most AI providers do n…

⚠️ Tax rates and exchange rates are approximate and may vary. Actual charges depend on your payment method, billing country, and current exchange rates. Prices shown are estimates only — verify with your provider before subscribing.

💰 Monthly Cost Calculator

Input tokens / request

~750 words ≈ 1K tokens

Output tokens / request

~375 words ≈ 500 tokens

Requests per day

= 900/month

Apply Batch API pricing (discount varies by provider — async 24hr processing)

Your usage estimate

0.90Minput tokens/mo

0.45Moutput tokens/mo

900requests/mo

⚠️ This estimate excludes:

Taxes, tool/function call costs, web search fees, image/audio input costs, cached write costs, long-context surcharges, and minimum request unit rounding.

Prompt caching can reduce input costs up to 90% for repeated prompts. Actual costs may be 20–50% higher depending on tool/search/image usage.

What this usage means

Quick read before you scan every model row. Costs assume standard pricing only. Discounts (batch, cached input) not applied unless toggled above.

Full API vs subscription guide →

API likely cheaper

Your cheapest API option is about $0.08/mo, below a typical $20 subscription.

Moderate workload

900 requests/month using 1.35M total tokens.

Best price in view

Llama 3.1 8B Instant (Groq) at $0.08/mo.

Workload:

Show every token-priced model in the calculator.

Recommended APIs for this workload

These cards use the selected workload and provider filters, then rank models by monthly cost, quality fit, and production fit.

Read the API pricing guide

Cheapest

Lowest bill

Llama 3.1 8B Instant

Groq · $0.08/mo

Start here when the workload is repetitive, simple, or high volume.

Quality

Best output quality

Gemini 3.1 Pro Preview

Google (Gemini API) · $7/mo

Use this when accuracy, reasoning, writing quality, or hard coding tasks matter more than raw price.

Production

Best app backend

Gemini 2.5 Flash-Lite

Google (Gemini API) · $0.27/mo

A practical candidate for RAG, routing, support chat, agents, or cached/batch workloads.

Model	Provider	Use case fit	Est. Monthly Cost	vs $20 Sub	Input / 1M	Output / 1M	Cached / 1M	Context	Best for
Llama 3.1 8B Instant 20M input tokens per dollar. Fastest inference available for this model size.	Groq	RealtimeLowest cost	$0.08 /month	No sub	$0.05 /1M	$0.08 /1M	—	128K	Ultra-fast simple tasks at 500+ tok/s — latency-sensitive classification and routing
Command R7B Cheapest flagship-quality API as of June 2026. 4x cheaper than GPT-5.4 Nano on i…	Cohere	RAGLowest costRealtime	$0.10 /month	No sub	$0.0375 /1M	$0.15 /1M	—	128K	High-volume RAG, classification, routing — cheapest first-party production API
Llama 4 Scout (DeepInfra) DeepInfra hosted meta-llama/Llama-4-Scout-17B-16E-Instruct.	Meta Llama (hosted via DeepInfra)	Lowest costRealtimeResearch	$0.23 /month	No sub	$0.1 /1M	$0.3 /1M	—	327,680	Efficient MoE model — fast and cheap for classification, chat, light coding tasks
Llama 3.3 70B Turbo (DeepInfra) Current DeepInfra Turbo endpoint; replaces the deprecated non-Turbo endpoint pre…	Meta Llama (hosted via DeepInfra)	Lowest costRealtimeCoding	$0.23 /month	No sub	$0.1 /1M	$0.32 /1M	—	128K	Cheapest hosted 70B model — best cost/quality for general tasks
DeepSeek V4 Flash One of the cheapest frontier-class APIs available. Strong on coding benchmarks.	DeepSeek	Lowest costCodingRealtime	$0.25 /month	No sub	$0.14 /1M	$0.28 /1M	$0.0028 cached	1M	Budget coding tasks, high-volume text generation, agentic pipelines where cost matters most
Gemini 2.5 Flash-Lite Cheapest model in Google lineup. Flat pricing on all context lengths.	Google (Gemini API)	Lowest costRealtimeRAG	$0.27 /month	API cheaper save $20/mo	$0.1 /1M	$0.4 /1M	$0.01 cached	1M	Ultra-high-volume simple tasks: classification, routing, lightweight extraction
Mistral Small 4 Strong price-performance for EU-based applications.	Mistral AI	Lowest costRealtime	$0.41 /month	API cheaper save $20/mo	$0.15 /1M	$0.6 /1M	—	256K	EU deployments, budget general tasks — French company with EU data residency
Command R Native grounding and citation generation. Strong for retrieval-heavy workloads.	Cohere	RAGLowest costResearch	$0.41 /month	No sub	$0.15 /1M	$0.6 /1M	—	128K	RAG pipelines, cost-efficient enterprise chat with native grounding and tool use
GPT-OSS 120B Current GroqCloud replacement for the retired Kimi K2 endpoint.	Groq	RealtimeLowest cost	$0.41 /month	No sub	$0.15 /1M	$0.6 /1M	—	128K	Fast open-weight reasoning and agentic workloads on GroqCloud
Llama 4 Maverick FP8 (DeepInfra) DeepInfra hosted meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8.	Meta Llama (hosted via DeepInfra)	CodingLowest costResearch	$0.54 /month	No sub	$0.2 /1M	$0.8 /1M	—	1M	Best open-weight model for coding and reasoning — strong competitor to GPT-5.4
Codestral Purpose-built coding model. Very competitive pricing vs OpenAI/Anthropic for cod…	Mistral AI	CodingLowest cost	$0.68 /month	API cheaper save $19/mo	$0.3 /1M	$0.9 /1M	—	128K	Code completion, FIM (fill-in-the-middle), IDE autocomplete, coding-specific tasks — purpose-built for code
GPT-5.4 Nano Cheapest OpenAI model for production. Official short-context standard pricing.	OpenAI	Lowest costRealtimeRAG	$0.74 /month	API cheaper save $19/mo	$0.2 /1M	$1.25 /1M	$0.02 cached	400K	Ultra-high-volume simple tasks: classification, intent detection, short-form extraction
DeepSeek V4 Pro Current official price from DeepSeek API docs. Legacy deepseek-chat and deepseek…	DeepSeek	CodingLowest cost	$0.78 /month	No sub	$0.435 /1M	$0.87 /1M	$0.003625 cached	1M	Complex coding and reasoning at significantly lower cost than OpenAI/Anthropic equivalents
Llama 3.3 70B Versatile ~250 tok/s. Undercuts GPT-4o mini on output cost while offering flagship-class c…	Groq	RealtimeLowest costCoding	$0.89 /month	No sub	$0.59 /1M	$0.79 /1M	—	128K	Flagship-quality at low cost — coding, QA, summarization on Groq LPU infrastructure
Gemini 3.1 Flash-Lite Current cost-efficient Gemini 3 model. Batch/Flex standard text pricing is $0.12…	Google (Gemini API)	Lowest costRealtimeRAG	$0.90 /month	API cheaper save $19/mo	$0.25 /1M	$1.5 /1M	$0.025 cached	1M	High-volume agentic tasks, translation, lightweight data processing
Mistral Large 3 Flagship Mistral model. Good alternative to GPT-4o for EU deployments.	Mistral AI	QualityResearchLowest cost	$1 /month	API cheaper save $19/mo	$0.5 /1M	$1.5 /1M	—	256K	Complex reasoning, EU compliance-sensitive tasks, multilingual European deployments
Sonar Search included at no extra charge. ~$5-12 per 1,000 requests additional based o…	Perplexity (Sonar API)	ResearchRAG	$1 /month	No sub	$1 /1M	$1 /1M	—	127K	Web-grounded Q&A with citations — cheapest model with built-in real-time search
Gemini 2.5 Flash Best price-performance in current Gemini lineup. Free tier available.	Google (Gemini API)	RealtimeRAGLowest cost	$1 /month	API cheaper save $19/mo	$0.3 /1M	$2.5 /1M	$0.03 cached	1M	High-volume general tasks: summarization, extraction, coding assistance, rapid prototyping
grok-build-0.1 Early-access coding model listed on xAI API pricing.	xAI (Grok API)	CodingRealtime	$2 /month	API cheaper save $18/mo	$1 /1M	$2 /1M	$0.2 cached	256K	Agentic coding and build tasks on xAI infrastructure
Kimi K2.5 Official Kimi API price: cache hit $0.10, cache miss $0.60, output $3.00 per MTo…	Moonshot AI (Kimi)	CodingLowest costResearch	$2 /month	API cheaper save $18/mo	$0.6 /1M	$3 /1M	$0.1 cached	256000	Budget coding and multimodal tasks — cheapest vision-capable model in this comparison
Grok 4.3 Current xAI flagship text model.	xAI (Grok API)	QualityResearchRealtime	$2 /month	API cheaper save $18/mo	$1.25 /1M	$2.5 /1M	$0.2 cached	1M	General reasoning, tool calling, long-context applications
Kimi K2.6 Official Kimi API price: cache hit $0.16, cache miss $0.95, output $4.00 per MTo…	Moonshot AI (Kimi)	CodingResearch	$3 /month	API cheaper save $17/mo	$0.95 /1M	$4 /1M	$0.16 cached	256000	Coding and agentic tasks at 4× lower cost than Claude Sonnet — strong coding benchmark scores
GPT-5.4 Mini Mid-range GPT-5.4 family model. Official short-context standard pricing.	OpenAI	CodingResearch	$3 /month	API cheaper save $17/mo	$0.75 /1M	$4.5 /1M	$0.075 cached	400K	Lower-cost coding, long-form summarization, and medium-complexity agent steps
Claude Haiku 4.5 Cheapest current-gen Claude. Replaces deprecated Haiku 3 ($0.25/$1.25).	Anthropic	RealtimeRAG	$3 /month	API cheaper save $17/mo	$1 /1M	$5 /1M	$0.1 cached	200K	Classification, routing, extraction, summarization, high-volume workloads
GPT-5.6 Luna	OpenAI	RealtimeCoding	$4 /month	API cheaper save $16/mo	$1 /1M	$6 /1M	$0.1 cached	1.05M	Fastest and cheapest GPT-5.6 — good for high-volume, latency-sensitive tasks
Moonshot V1 (128K) Legacy model. Official price is $2 input/$5 output per MTok; platform sunset exp…	Moonshot AI (Kimi)	Research	$4 /month	API cheaper save $16/mo	$2 /1M	$5 /1M	—	128000	Legacy model — superseded by K2.5 and K2.6. Listed for reference only.
Gemini 2.5 Pro Best value for complex tasks in Gemini lineup. 2x surcharge beyond 200K.	Google (Gemini API)	ResearchCodingQuality	$6 /month	API cheaper save $14/mo	$1.25 /1M	$10 /1M	$0.125 cached	1M	Complex reasoning, coding, long-document analysis — strong at 1M context tasks
Claude Sonnet 5 Introductory pricing through 2026-08-31; standard pricing is $3/$15 per MTok the…	Anthropic	CodingResearch	$6 /month	API cheaper save $14/mo	$2 /1M	$10 /1M	$0.2 cached	1M	High-performance coding, agents, and general production workloads during introductory pricing
Command R+ Same price as GPT-5.4 on input, 33% cheaper on output. Purpose-built for RAG bea…	Cohere	RAGResearchQuality	$7 /month	No sub	$2.5 /1M	$10 /1M	—	128K	Complex agentic RAG, enterprise document intelligence, multi-step grounded reasoning
Gemini 3.1 Pro Preview Preview model. Standard price for prompts up to 200K tokens; prompts above 200K …	Google (Gemini API)	QualityResearchCoding	$7 /month	API cheaper save $13/mo	$2 /1M	$12 /1M	$0.2 cached	1M	High-quality multimodal reasoning, agentic workflows, vibe-coding
GPT-5.4 Strong balance of capability and cost. Official short-context standard pricing; …	OpenAI	CodingResearch	$9 /month	API cheaper save $11/mo	$2.5 /1M	$15 /1M	$0.25 cached	1.05M	Coding (debugging, code gen, refactoring), content generation, analysis — solid all-rounder
GPT-5.6 Terra	OpenAI	CodingResearch	$9 /month	API cheaper save $11/mo	$2.5 /1M	$15 /1M	$0.25 cached	1.05M	Balanced quality and cost — good daily driver for most tasks
Claude Sonnet 4.6 Legacy/current comparison point. Sonnet 5 is cheaper during introductory pricing…	Anthropic	CodingResearchRAG	$9 /month	API cheaper save $11/mo	$3 /1M	$15 /1M	$0.3 cached	1M	General coding, analysis, writing, RAG pipelines, agentic tasks
Sonar Pro $6-14 per 1,000 requests additional. Use when citation accuracy and source depth…	Perplexity (Sonar API)	ResearchRAG	$9 /month	No sub	$3 /1M	$15 /1M	—	127K	Deep research with multi-source citations — complex questions requiring current web data
Claude Opus 4.8 Current Opus model. Fast Mode is 2x standard pricing.	Anthropic	QualityCodingResearch	$16 /month	API cheaper save $4/mo	$5 /1M	$25 /1M	$0.5 cached	1M	Complex reasoning, agentic workflows, code review, long-context analysis
GPT-5.5 Flagship model. Official short-context standard pricing; long-context and Priori…	OpenAI	QualityCodingResearch	$18 /month	Break-even	$5 /1M	$30 /1M	$0.5 cached	1.05M	Complex multi-step workflows, frontier writing quality, tool-use intensive agents
GPT-5.6 Sol	OpenAI	QualityCodingResearch	$18 /month	Break-even	$5 /1M	$30 /1M	$0.5 cached	1.05M	Flagship — highest quality for complex reasoning, coding, and research
Claude Fable 5 Top-tier Claude model listed in official pricing. More expensive than Opus 4.8.	Anthropic	QualityResearchCoding	$32 /month	Sub cheaper $12/mo over	$10 /1M	$50 /1M	$1 cached	1M	Frontier reasoning, long-context agentic workflows, high-stakes analysis
Claude Mythos 5 Limited availability. Official Claude Platform pricing is $10/$50 per MTok.	Anthropic	QualityResearch	$32 /month	Sub cheaper $12/mo over	$10 /1M	$50 /1M	$1 cached	1M	Advanced reasoning and long-context work where limited availability is acceptable
GPT-5.5 Pro Premium model. Official pricing is $30/$180 per MTok; no cached-input discount.	OpenAI	QualityResearch	$108 /month	Sub cheaper $88/mo over	$30 /1M	$180 /1M	—	1.05M	Most demanding frontier tasks where quality matters more than cost

Prices shown are standard per-token rates. Batch, cached, priority, and long-context rates vary. Verify current pricing at each provider's official API documentation.

Llama 3.1 8B Instant

Groq

$0.08

/month

RealtimeLowest cost128K

Input /1M

$0.05

Output /1M

$0.08

Best for: Ultra-fast simple tasks at 500+ tok/s — latency-sensitive classification and routing

Command R7B

Cohere

$0.10

/month

RAGLowest costRealtime128K

Input /1M

$0.0375

Output /1M

$0.15

Best for: High-volume RAG, classification, routing — cheapest first-party production API

Llama 4 Scout (DeepInfra)

Meta Llama (hosted via DeepInfra)

$0.23

/month

Lowest costRealtimeResearch327,680

Input /1M

$0.1

Output /1M

$0.3

Best for: Efficient MoE model — fast and cheap for classification, chat, light coding tasks

Llama 3.3 70B Turbo (DeepInfra)

Meta Llama (hosted via DeepInfra)

$0.23

/month

Lowest costRealtimeCoding128K

Input /1M

$0.1

Output /1M

$0.32

Best for: Cheapest hosted 70B model — best cost/quality for general tasks

DeepSeek V4 Flash

DeepSeek

$0.25

/month

Lowest costCodingRealtime1M

Input /1M

$0.14

Output /1M

$0.28

Cached

$0.0028

Best for: Budget coding tasks, high-volume text generation, agentic pipelines where cost matters most

Gemini 2.5 Flash-Lite

Google (Gemini API)

$0.27

/month

Lowest costRealtimeRAGAPI cheaper1M

Input /1M

$0.1

Output /1M

$0.4

Cached

$0.01

Best for: Ultra-high-volume simple tasks: classification, routing, lightweight extraction

Mistral Small 4

Mistral AI

$0.41

/month

Lowest costRealtimeAPI cheaper256K

Input /1M

$0.15

Output /1M

$0.6

Best for: EU deployments, budget general tasks — French company with EU data residency

Command R

Cohere

$0.41

/month

RAGLowest costResearch128K

Input /1M

$0.15

Output /1M

$0.6

Best for: RAG pipelines, cost-efficient enterprise chat with native grounding and tool use

GPT-OSS 120B

Groq

$0.41

/month

RealtimeLowest cost128K

Input /1M

$0.15

Output /1M

$0.6

Best for: Fast open-weight reasoning and agentic workloads on GroqCloud

Llama 4 Maverick FP8 (DeepInfra)

Meta Llama (hosted via DeepInfra)

$0.54

/month

CodingLowest costResearch1M

Input /1M

$0.2

Output /1M

$0.8

Best for: Best open-weight model for coding and reasoning — strong competitor to GPT-5.4

Codestral

Mistral AI

$0.68

/month

CodingLowest costAPI cheaper128K

Input /1M

$0.3

Output /1M

$0.9

Best for: Code completion, FIM (fill-in-the-middle), IDE autocomplete, coding-specific tasks — purpose-built for code

GPT-5.4 Nano

OpenAI

$0.74

/month

Lowest costRealtimeRAGAPI cheaper400K

Input /1M

$0.2

Output /1M

$1.25

Cached

$0.02

Best for: Ultra-high-volume simple tasks: classification, intent detection, short-form extraction

DeepSeek V4 Pro

DeepSeek

$0.78

/month

CodingLowest cost1M

Input /1M

$0.435

Output /1M

$0.87

Cached

$0.003625

Best for: Complex coding and reasoning at significantly lower cost than OpenAI/Anthropic equivalents

Llama 3.3 70B Versatile

Groq

$0.89

/month

RealtimeLowest costCoding128K

Input /1M

$0.59

Output /1M

$0.79

Best for: Flagship-quality at low cost — coding, QA, summarization on Groq LPU infrastructure

Gemini 3.1 Flash-Lite

Google (Gemini API)

$0.90

/month

Lowest costRealtimeRAGAPI cheaper1M

Input /1M

$0.25

Output /1M

$1.5

Cached

$0.025

Best for: High-volume agentic tasks, translation, lightweight data processing

Mistral Large 3

Mistral AI

/month

QualityResearchLowest costAPI cheaper256K

Input /1M

$0.5

Output /1M

$1.5

Best for: Complex reasoning, EU compliance-sensitive tasks, multilingual European deployments

Sonar

Perplexity (Sonar API)

/month

ResearchRAG127K

Input /1M

Output /1M

Best for: Web-grounded Q&A with citations — cheapest model with built-in real-time search

Gemini 2.5 Flash

Google (Gemini API)

/month

RealtimeRAGLowest costAPI cheaper1M

Input /1M

$0.3

Output /1M

$2.5

Cached

$0.03

Best for: High-volume general tasks: summarization, extraction, coding assistance, rapid prototyping

grok-build-0.1

xAI (Grok API)

/month

CodingRealtimeAPI cheaper256K

Input /1M

Output /1M

Cached

$0.2

Best for: Agentic coding and build tasks on xAI infrastructure

Kimi K2.5

Moonshot AI (Kimi)

/month

CodingLowest costResearchAPI cheaper256000

Input /1M

$0.6

Output /1M

Cached

$0.1

Best for: Budget coding and multimodal tasks — cheapest vision-capable model in this comparison

Grok 4.3

xAI (Grok API)

/month

QualityResearchRealtimeAPI cheaper1M

Input /1M

$1.25

Output /1M

$2.5

Cached

$0.2

Best for: General reasoning, tool calling, long-context applications

Kimi K2.6

Moonshot AI (Kimi)

/month

CodingResearchAPI cheaper256000

Input /1M

$0.95

Output /1M

Cached

$0.16

Best for: Coding and agentic tasks at 4× lower cost than Claude Sonnet — strong coding benchmark scores

GPT-5.4 Mini

OpenAI

/month

CodingResearchAPI cheaper400K

Input /1M

$0.75

Output /1M

$4.5

Cached

$0.075

Best for: Lower-cost coding, long-form summarization, and medium-complexity agent steps

Claude Haiku 4.5

Anthropic

/month

RealtimeRAGAPI cheaper200K

Input /1M

Output /1M

Cached

$0.1

Best for: Classification, routing, extraction, summarization, high-volume workloads

GPT-5.6 Luna

OpenAI

/month

RealtimeCodingAPI cheaper1.05M

Input /1M

Output /1M

Cached

$0.1

Best for: Fastest and cheapest GPT-5.6 — good for high-volume, latency-sensitive tasks

Moonshot V1 (128K)

Moonshot AI (Kimi)

/month

ResearchAPI cheaper128000

Input /1M

Output /1M

Best for: Legacy model — superseded by K2.5 and K2.6. Listed for reference only.

Gemini 2.5 Pro

Google (Gemini API)

/month

ResearchCodingQualityAPI cheaper1M

Input /1M

$1.25

Output /1M

$10

Cached

$0.125

Best for: Complex reasoning, coding, long-document analysis — strong at 1M context tasks

Claude Sonnet 5

Anthropic

/month

CodingResearchAPI cheaper1M

Input /1M

Output /1M

$10

Cached

$0.2

Best for: High-performance coding, agents, and general production workloads during introductory pricing

Command R+

Cohere

/month

RAGResearchQuality128K

Input /1M

$2.5

Output /1M

$10

Best for: Complex agentic RAG, enterprise document intelligence, multi-step grounded reasoning

Gemini 3.1 Pro Preview

Google (Gemini API)

/month

QualityResearchCodingAPI cheaper1M

Input /1M

Output /1M

$12

Cached

$0.2

Best for: High-quality multimodal reasoning, agentic workflows, vibe-coding

GPT-5.4

OpenAI

/month

CodingResearchAPI cheaper1.05M

Input /1M

$2.5

Output /1M

$15

Cached

$0.25

Best for: Coding (debugging, code gen, refactoring), content generation, analysis — solid all-rounder

GPT-5.6 Terra

OpenAI

/month

CodingResearchAPI cheaper1.05M

Input /1M

$2.5

Output /1M

$15

Cached

$0.25

Best for: Balanced quality and cost — good daily driver for most tasks

Claude Sonnet 4.6

Anthropic

/month

CodingResearchRAGAPI cheaper1M

Input /1M

Output /1M

$15

Cached

$0.3

Best for: General coding, analysis, writing, RAG pipelines, agentic tasks

Sonar Pro

Perplexity (Sonar API)

/month

ResearchRAG127K

Input /1M

Output /1M

$15

Best for: Deep research with multi-source citations — complex questions requiring current web data

Claude Opus 4.8

Anthropic

$16

/month

QualityCodingResearchAPI cheaper1M

Input /1M

Output /1M

$25

Cached

$0.5

Best for: Complex reasoning, agentic workflows, code review, long-context analysis

GPT-5.5

OpenAI

$18

/month

QualityCodingResearchBreak-even1.05M

Input /1M

Output /1M

$30

Cached

$0.5

Best for: Complex multi-step workflows, frontier writing quality, tool-use intensive agents

GPT-5.6 Sol

OpenAI

$18

/month

QualityCodingResearchBreak-even1.05M

Input /1M

Output /1M

$30

Cached

$0.5

Best for: Flagship — highest quality for complex reasoning, coding, and research

Claude Fable 5

Anthropic

$32

/month

QualityResearchCodingSub cheaper1M

Input /1M

$10

Output /1M

$50

Cached

Best for: Frontier reasoning, long-context agentic workflows, high-stakes analysis

Claude Mythos 5

Anthropic

$32

/month

QualityResearchSub cheaper1M

Input /1M

$10

Output /1M

$50

Cached

Best for: Advanced reasoning and long-context work where limited availability is acceptable

GPT-5.5 Pro

OpenAI

$108

/month

QualityResearchSub cheaper1.05M

Input /1M

$30

Prices exclude applicable taxes. Kimi K2.6 and K2.5 prices were extracted from the official page source because the rendered table parser omitted the numeric cells. Moonshot V1 platform sunset is expected on 2026-08-31.

API vs subscription: which is cheaper for you?

At ~2,000–2,200 interactions/month, Claude Sonnet API and Claude Pro subscription cost roughly the same. Below that, API wins. Above it, the flat $20/month Pro subscription is cheaper. Read our full breakdown:

API vs Subscription: When does pay-per-token save money? →

All 12 providers and 42 listed models rechecked against first-party pricing pages on 2026-07-17. Corrected OpenAI, Google, Anthropic, Mistral, Meta-hosted, Groq, and FLUX records; model-level verification metadata is complete. Workload tags were added on 2026-07-17 as a first-pass editorial classification for API pricing UX. Cost calculator uses standard pricing; batch and caching discounts apply separately. Actual costs may vary based on model routing, context length, and feature usage.