AI & TechnologyJune 2026·11 min read·Last updated: June 2026

OpenAI API Pricing in 2026: What the Cost Curve Actually Looks Like Over Time

GPT-5 costs $1.25 per million input tokens and $10 per million output in 2026 — a roughly 95% drop from GPT-4's 2023 launch price. The per-model breakdown, the trend line, and what it means for anyone building on the API.

TC
Trace Cohen
Co-Founder & GP at Six Point Ventures · 3x founder (BrandYourself, Launch.it, SPOT) · 65+ investments · Based in Boca Raton, FL

Quick Answer

$1.25 per 1M input tokens and $10 per 1M output tokens is OpenAI's GPT-5 API pricing in 2026. GPT-5 mini runs $0.25/$2 and GPT-5 nano $0.05/$0.40 per 1M tokens. Prices have fallen roughly 80% per year since GPT-4 launched at $30/$60 in 2023, making frontier intelligence about 25x cheaper in under three years.

GPT-5 costs $1.25 per million input tokens and $10 per million output tokens in 2026 — down from the $30/$60 GPT-4 charged at its March 2023 launch, a roughly 95% collapse in the price of frontier intelligence in under three years.

That's the short answer. The longer answer is more interesting, because the rate of decline — about 80% per year — is the single most important number for anyone building an AI product, planning a margin model, or pricing a startup round around "AI is too expensive to be profitable." The cost curve says otherwise, and it says so consistently.

OpenAI API pricing in 2026: the full model table

OpenAI API pricing in 2026 is tiered by model capability, billed per 1 million tokens, split between input (what you send) and output (what the model generates). GPT-5 is the flagship at $1.25 input and $10 output per 1M tokens; the mini and nano tiers drop to a fraction of that for high-volume work. Output tokens cost 4–8x more than input across every tier, which is why response length, not prompt length, usually drives the bill.

ModelInput / 1MOutput / 1MBest for
GPT-5$1.25$10.00Frontier reasoning, agents, coding
GPT-5 mini$0.25$2.00Balanced default for most apps
GPT-5 nano$0.05$0.40Classification, routing, extraction
GPT-4o (legacy)$2.50$10.00Multimodal, still widely deployed
GPT-4o mini (legacy)$0.15$0.60Cheap legacy workloads
o3 (reasoning)$2.00$8.00Hard math, science, planning
GPT-4 (2023 launch)$30.00$60.00Reference point — retired

Prices are per 1 million tokens. A token is roughly 4 characters or 0.75 words — so 1M tokens is about 750,000 words, or roughly ten full-length novels of text.

What the OpenAI API cost curve looks like over time

The OpenAI API cost curve over time bends down roughly 80% per year for equivalent capability. GPT-4 launched at $30 input / $60 output per 1M tokens in March 2023. By mid-2024 GPT-4o delivered better quality at $2.50/$10 — a 12x input cut in 15 months. By 2026 GPT-5 mini matches early-2024 frontier quality at $0.25/$2, and nano does useful work at $0.05/$0.40. The intelligence you bought for $30 in 2023 now costs well under $1.

Mar 2023GPT-4

Frontier launch price per 1M tokens

$30 / $60
Nov 2023GPT-4 Turbo

3x input cut, 128K context

$10 / $30
May 2024GPT-4o

Multimodal, half the Turbo input cost

$2.50 / $10
Jul 2024GPT-4o mini

Replaced GPT-3.5 at lower cost

$0.15 / $0.60
2025GPT-4.1 / o-series

Reasoning models, cached-input discounts

$2 / $8
2026GPT-5

Flagship at half GPT-4o input cost

$1.25 / $10
2026GPT-5 nano

600x cheaper input than 2023 GPT-4

$0.05 / $0.40

The pattern is consistent enough to plan around. If you assume per-token costs for a fixed quality level fall ~70–80% annually, a workload that costs $50,000/month today should cost roughly $10,000–$15,000 a year from now — without you changing a line of code, just by migrating to whatever the cheaper-equivalent model is. I've watched portfolio companies model AI COGS as a fixed line item and miss this entirely. It's a declining line item.

Why OpenAI API pricing keeps falling

Four forces compound to push the curve down. None of them are slowing in 2026:

Hardware and utilization gains

Newer GPUs plus better batching and speculative decoding cut the raw compute per token by 30–50% a year.

Model distillation

Small models like GPT-5 nano are trained to mimic frontier models, hitting 90%+ of the quality at 5% of the cost.

Competition

Anthropic's Claude, Google's Gemini, and open-weight models force OpenAI to cut prices to hold market share.

Inference-time optimization

Prompt caching, batching, and routing let providers serve the same demand on far less hardware.

The competitive piece matters most for builders. With AI model valuations stretched and several labs racing for the same enterprise seats, price is one of the few levers that moves a procurement decision. That dynamic keeps OpenAI honest on the API price sheet even as OpenAI's own valuation pushes past $300B.

How to reduce your OpenAI API bill in 2026

You don't have to wait for the next price cut. Three levers stack to cut a production bill 80–90% today, and most teams use none of them:

1
Prompt caching — up to 90% off repeated input
If your prompts share a fixed prefix (system prompt, retrieved context, few-shot examples), cached input tokens bill at roughly 10% of normal rate. For RAG and agent apps with long static contexts, this alone often halves the total bill.
2
Model routing — send easy work to mini/nano
Route classification, extraction, and simple Q&A to GPT-5 mini ($0.25/$2) or nano ($0.05/$0.40), reserving GPT-5 for hard reasoning. A router that sends 70% of traffic to nano cuts per-request cost by an order of magnitude.
3
Batch API — 50% off non-urgent jobs
For embeddings, evals, backfills, and overnight processing, the Batch API charges half price with a 24-hour turnaround. Anything that doesn't need a synchronous response should run here.

A worked example: an app doing 10M input and 2M output tokens a day on GPT-5 pays about $32.50/day, or ~$975/month. Move 70% of traffic to nano, cache 60% of input, and batch the overnight jobs, and the same workload lands near $120–$150/month. Same product, ~85% lower cost.

What this means for builders and investors

What gets easier

  • ✓ Margins improve automatically as token costs fall ~80%/yr
  • ✓ Features that were uneconomic in 2024 ship in 2026
  • ✓ High-volume nano workloads at $0.05/1M unlock new use cases
  • ✓ Caching and batching turn AI COGS into a tunable knob

What gets harder

  • ✕ "We're cheaper" is not a moat — everyone's costs fall too
  • ✕ Reselling raw tokens with a markup gets competed away
  • ✕ Margin from model arbitrage erodes as prices converge
  • ✕ Defensibility has to come from data, workflow, or distribution

For investors, the takeaway is simple: a startup whose entire pitch is "AI is expensive and we use it efficiently" has a thesis with a 12-month shelf life. The interesting companies treat the cost curve as a tailwind — they build products that only make sense once tokens are nearly free, and they put their moat somewhere the price sheet can't touch.

Frontier intelligence got ~25x cheaper in under three years — and the curve isn't bending up.

Price your product for where tokens will be, not where they are.

Track AI model pricing and valuation trends on the AI Valuations Dashboard at Value Add VC. Originally published in the Trace Cohen newsletter.

Frequently Asked Questions

How much does the OpenAI API cost in 2026?

GPT-5, OpenAI's flagship model, costs $1.25 per 1 million input tokens and $10 per 1 million output tokens in 2026. Cheaper tiers go much lower: GPT-5 mini is $0.25/$2 and GPT-5 nano is $0.05/$0.40 per 1M tokens. A typical 1,000-token request with a 500-token answer costs about $0.0063 on GPT-5 and under a tenth of a cent on nano.

Why is OpenAI API pricing falling so fast?

Inference costs have dropped roughly 80% per year because of better GPU utilization, smaller distilled models that match older frontier quality, speculative decoding, and aggressive competition from Anthropic, Google, and open-weight models. GPT-4 launched at $30/$60 per 1M tokens in March 2023; the equivalent intelligence costs under $1.25/$10 in 2026 — about a 25x reduction.

What is the cheapest OpenAI model in 2026?

GPT-5 nano is the cheapest at $0.05 per 1M input tokens and $0.40 per 1M output tokens. It is built for high-volume classification, routing, and extraction tasks where you do not need frontier reasoning. At those rates you can process roughly 20 million input tokens for $1, versus about 800,000 on GPT-5.

How can I reduce my OpenAI API bill?

The biggest levers are prompt caching (up to 90% off repeated input tokens), routing easy requests to GPT-5 mini or nano, and the Batch API (50% off for non-urgent jobs with a 24-hour turnaround). Combining cached inputs, model routing, and batching can cut a production bill by 80–90% without changing output quality on most tasks.

Is GPT-5 cheaper than GPT-4o was?

Yes. GPT-4o was priced at $2.50 per 1M input and $10 per 1M output tokens at its 2024 launch. GPT-5 in 2026 is $1.25 input and $10 output — half the input cost for a more capable model. Output pricing held flat while reasoning quality improved, which is why output tokens now dominate most production bills.

Explore 45+ free VC tools, dashboards, and recommended startup software.