GPT-5 costs $1.25 per million input tokens and $10 per million output tokens in 2026 — down from the $30/$60 GPT-4 charged at its March 2023 launch, a roughly 95% collapse in the price of frontier intelligence in under three years.
That's the short answer. The longer answer is more interesting, because the rate of decline — about 80% per year — is the single most important number for anyone building an AI product, planning a margin model, or pricing a startup round around "AI is too expensive to be profitable." The cost curve says otherwise, and it says so consistently.
OpenAI API pricing in 2026: the full model table
OpenAI API pricing in 2026 is tiered by model capability, billed per 1 million tokens, split between input (what you send) and output (what the model generates). GPT-5 is the flagship at $1.25 input and $10 output per 1M tokens; the mini and nano tiers drop to a fraction of that for high-volume work. Output tokens cost 4–8x more than input across every tier, which is why response length, not prompt length, usually drives the bill.
| Model | Input / 1M | Output / 1M | Best for |
|---|---|---|---|
| GPT-5 | $1.25 | $10.00 | Frontier reasoning, agents, coding |
| GPT-5 mini | $0.25 | $2.00 | Balanced default for most apps |
| GPT-5 nano | $0.05 | $0.40 | Classification, routing, extraction |
| GPT-4o (legacy) | $2.50 | $10.00 | Multimodal, still widely deployed |
| GPT-4o mini (legacy) | $0.15 | $0.60 | Cheap legacy workloads |
| o3 (reasoning) | $2.00 | $8.00 | Hard math, science, planning |
| GPT-4 (2023 launch) | $30.00 | $60.00 | Reference point — retired |
Prices are per 1 million tokens. A token is roughly 4 characters or 0.75 words — so 1M tokens is about 750,000 words, or roughly ten full-length novels of text.
What the OpenAI API cost curve looks like over time
The OpenAI API cost curve over time bends down roughly 80% per year for equivalent capability. GPT-4 launched at $30 input / $60 output per 1M tokens in March 2023. By mid-2024 GPT-4o delivered better quality at $2.50/$10 — a 12x input cut in 15 months. By 2026 GPT-5 mini matches early-2024 frontier quality at $0.25/$2, and nano does useful work at $0.05/$0.40. The intelligence you bought for $30 in 2023 now costs well under $1.
Frontier launch price per 1M tokens
3x input cut, 128K context
Multimodal, half the Turbo input cost
Replaced GPT-3.5 at lower cost
Reasoning models, cached-input discounts
Flagship at half GPT-4o input cost
600x cheaper input than 2023 GPT-4
The pattern is consistent enough to plan around. If you assume per-token costs for a fixed quality level fall ~70–80% annually, a workload that costs $50,000/month today should cost roughly $10,000–$15,000 a year from now — without you changing a line of code, just by migrating to whatever the cheaper-equivalent model is. I've watched portfolio companies model AI COGS as a fixed line item and miss this entirely. It's a declining line item.
Why OpenAI API pricing keeps falling
Four forces compound to push the curve down. None of them are slowing in 2026:
Hardware and utilization gains
Newer GPUs plus better batching and speculative decoding cut the raw compute per token by 30–50% a year.
Model distillation
Small models like GPT-5 nano are trained to mimic frontier models, hitting 90%+ of the quality at 5% of the cost.
Competition
Anthropic's Claude, Google's Gemini, and open-weight models force OpenAI to cut prices to hold market share.
Inference-time optimization
Prompt caching, batching, and routing let providers serve the same demand on far less hardware.
The competitive piece matters most for builders. With AI model valuations stretched and several labs racing for the same enterprise seats, price is one of the few levers that moves a procurement decision. That dynamic keeps OpenAI honest on the API price sheet even as OpenAI's own valuation pushes past $300B.
How to reduce your OpenAI API bill in 2026
You don't have to wait for the next price cut. Three levers stack to cut a production bill 80–90% today, and most teams use none of them:
A worked example: an app doing 10M input and 2M output tokens a day on GPT-5 pays about $32.50/day, or ~$975/month. Move 70% of traffic to nano, cache 60% of input, and batch the overnight jobs, and the same workload lands near $120–$150/month. Same product, ~85% lower cost.
What this means for builders and investors
What gets easier
- ✓ Margins improve automatically as token costs fall ~80%/yr
- ✓ Features that were uneconomic in 2024 ship in 2026
- ✓ High-volume nano workloads at $0.05/1M unlock new use cases
- ✓ Caching and batching turn AI COGS into a tunable knob
What gets harder
- ✕ "We're cheaper" is not a moat — everyone's costs fall too
- ✕ Reselling raw tokens with a markup gets competed away
- ✕ Margin from model arbitrage erodes as prices converge
- ✕ Defensibility has to come from data, workflow, or distribution
For investors, the takeaway is simple: a startup whose entire pitch is "AI is expensive and we use it efficiently" has a thesis with a 12-month shelf life. The interesting companies treat the cost curve as a tailwind — they build products that only make sense once tokens are nearly free, and they put their moat somewhere the price sheet can't touch.
Frontier intelligence got ~25x cheaper in under three years — and the curve isn't bending up.
Price your product for where tokens will be, not where they are.
Track AI model pricing and valuation trends on the AI Valuations Dashboard at Value Add VC. Originally published in the Trace Cohen newsletter.