Question 1

How is this different from a "ChatGPT vs Claude" blog post?

Accepted Answer

Blog posts pick two models, write 1500 words of vibes, and never
update. This table compares 20+ current models on the columns that
actually drive a model pick — context window, input/output price
per million tokens, throughput, training cutoff, and three
capability scores (Chinese, code, reasoning) — and you can sort or
filter by any of them. The data has a visible "as-of" date so you
can tell when it was last refreshed instead of trusting a stale
listicle.

Question 2

Where do the prices and benchmark scores come from?

Accepted Answer

Pricing comes from each vendor's public API pricing page (OpenAI,
Anthropic, Google AI, Mistral, DeepSeek, Alibaba Cloud, Zhipu,
01.AI, Together AI for open-weight hosting). Throughput and
capability scores are aggregated from Artificial Analysis's public
benchmark dashboard plus our own internal eval pairs. We never
quote a price that is not on a public pricing page; if a model is
self-hosted only, we mark price as "self-hosted" and skip the per-
token number.

Question 3

Why three capability scores instead of one "overall" rank?

Accepted Answer

A single ranking would lie. The same model can be best at one task
and middle of the pack at another — Claude is strong at code,
Qwen and DeepSeek are strong at Chinese, o-series and Opus are
strong at reasoning. We split into Chinese, code, and reasoning
because those are the three axes our own usage actually disagrees
on. Pick the column that matches what you're building; the "best
value" leaderboard combines all three weighted by price for a
single sortable number if you want one.

Question 4

What does the "best value" leaderboard actually compute?

Accepted Answer

It is capability-per-dollar. We average the three capability
scores (Chinese, code, reasoning) to get a 0-10 capability number,
then divide by the blended price (input price weighted 0.7,
output price weighted 0.3 — roughly matching real prompt/completion
ratios). Higher number means more capability per dollar. Open-weight
models with self-hosted pricing are excluded from this leaderboard
because their cost depends on your hardware, not a per-token rate.

Question 5

Does this comparison cost anything or send my data anywhere?

Accepted Answer

No. The whole table is a static data array that ships with the
page. No API call, no analytics on what you sort by, no account
required. You can use it on a phone with no signal once the page
has loaded. We do not have a "premium" tier that unlocks more
models — every model we track is visible to everyone.

Question 6

How often is the data refreshed?

Accepted Answer

We update on two triggers: when a vendor changes a price
(these are tracked) and when a new model with significant
capability or pricing changes is released. The snapshot date is
printed at the top of the table so you can see at a glance how
stale or fresh you are looking at. If you need real-time pricing
for billing, always re-check the vendor page — we are reference,
not source of truth for invoicing.

Question 7

Why are some prices marked "self-hosted"?

Accepted Answer

Open-weight models like Llama, Mistral, Qwen, DeepSeek, GLM and
Yi have no official per-token API price from the model maker —
you either host them yourself or use a third-party host (Together,
Fireworks, Groq, DeepInfra) that sets its own rate. Rather than
pin a single third-party rate that misleads, we mark these
"self-hosted" and let you click through to the model card. The
context window, capability scores and throughput are still
benchmarked so the row stays useful.

Question 8

Is this list complete? What models are missing?

Accepted Answer

It covers the 20+ models people actually pick between in 2026 for
general use. We deliberately skip: research-only previews with no
stable API, region-locked models without English-language docs,
and "fine-tunes of a fine-tune" that do not change underlying
capability. If a model you use is missing and has a public price +
benchmark, file an issue — we add it if it survives our quality
bar, which is harder than just "exists."


Yi-Lightning	01.AI	16K	$0.14	$0.14	130	Jun 2024	8.9	8.1	8.0	59.52
Gemini 2.0 Flash	Google	1M	$0.15	$0.60	180	Aug 2024	8.3	8.2	8.0	28.65
DeepSeek V3OW	DeepSeek	128K	$0.27	$1.10	60	Dec 2024	9.1	9.0	8.7	17.21
Qwen 2.5 72BOW	Alibaba	128K	$0.40	$1.20	65	Sep 2024	9.2	8.5	8.3	13.54
DeepSeek R1OW	DeepSeek	128K	$0.55	$2.19	40	Dec 2024	9.0	9.1	9.3	8.77
GLM-4 Plus	Zhipu	128K	$0.70	$2.10	75	Aug 2024	9.0	8.4	8.2	7.62
Gemini 2.0 Pro	Google	2M	$1.25	$5.00	95	Nov 2024	8.7	8.7	8.8	3.68
Claude Haiku 4.5	Anthropic	200K	$1.00	$5.00	140	Feb 2025	7.9	8.1	7.8	3.61
Qwen 2.5 Max	Alibaba	32K	$1.60	$6.40	70	Oct 2024	9.3	8.7	8.6	2.92
Mistral Large 2OW	Mistral	128K	$2.00	$6.00	60	Jul 2024	7.6	8.3	8.2	2.51
GPT-4.1	OpenAI	1M	$2.00	$8.00	110	May 2024	8.5	8.9	8.6	2.28
GPT-4o	OpenAI	128K	$2.50	$10	90	Oct 2023	8.4	8.3	8.1	1.74
Claude Sonnet 4	Anthropic	200K	$3.00	$15	80	Jan 2025	8.6	9.3	8.9	1.35
ERNIE 4.0 Turbo	Baidu	128K	$4.00	$12	60	May 2024	9.1	7.9	8.1	1.31
Grok 3	xAI	128K	$3.00	$15	70	Nov 2024	8.0	8.6	8.9	1.29
GPT-4 Turbo	OpenAI	128K	$10	$30	35	Dec 2023	7.8	8.2	8.4	0.51
o3	OpenAI	200K	$10	$40	30	Jun 2024	8.5	9.2	9.6	0.48
o1	OpenAI	200K	$15	$60	25	Oct 2023	8.2	8.8	9.4	0.31
Claude Opus 4.7	Anthropic	1M	$15	$75	55	Mar 2025	9.0	9.5	9.5	0.28
Llama 3.3 70BOW	Meta	128K	self-hosted	self-hosted	70	Jul 2024	7.4	8.0	8.1	—
Llama 4OW	Meta	1M	self-hosted	self-hosted	80	Feb 2025	8.1	8.5	8.6	—

AI Model Comparison — 20+ LLMs Across Price, Context, Speed & Capabilities (2026)

Leaderboards

What this tool does

Tool details

How to use

1. Input

2. Process

3. Copy / Download

How AI Model Comparison fits into your work

AI workflow jobs

AI checks

Good next steps

FAQ

Tool combos

AI Model Comparison — 20+ LLMs Across Price, Context, Speed & Capabilities (2026)

Leaderboards

What this tool does

Tool details

How to use

1. Input

2. Process

3. Copy / Download

How AI Model Comparison fits into your work

AI workflow jobs

AI checks

Good next steps

FAQ

JSON Formatter & Validator

AI Token Counter

Prompt Template Library

Headline Analyzer

Slogan Generator

LLM Pricing Calculator

System Prompt Builder

AI Eval Planner

Text to Speech

Add Line Numbers

AES Text Encryptor

Age Difference Calculator