GPT-4o costs $2.50 per million input tokens and $10 per million output in 2026; GPT-4o mini is $0.15/$0.60, and GPT-5 lands near $1.25/$10. That's the short answer. The longer answer โ what a real task actually costs โ is more interesting.
Headline per-token rates are the number everyone quotes and the number that matters least. What you actually pay depends on which model you route to, how many reasoning tokens it burns invisibly, whether your input is cached, and whether you run jobs in real time or in batch. Below is the full 2026 price list, then the math that turns those rates into a monthly bill.
OpenAI API Pricing 2026: The Full Per-Token Breakdown
OpenAI API pricing in 2026 spans from $0.15 per million input tokens for GPT-4o mini to $15 per million for the o1 reasoning model. GPT-4o, the default workhorse, costs $2.50 input and $10 output per million tokens. GPT-5 is priced at roughly $1.25 input and $10 output โ cheaper on input than the model it replaces. You pay only for tokens processed.
| Model | Input / 1M | Cached input / 1M | Output / 1M | Best for |
|---|---|---|---|---|
| GPT-5 | $1.25 | $0.125 | $10.00 | Hardest reasoning, long context |
| GPT-5 mini | $0.25 | $0.025 | $2.00 | Cheaper reasoning at scale |
| GPT-4o | $2.50 | $1.25 | $10.00 | General-purpose default |
| GPT-4o mini | $0.15 | $0.075 | $0.60 | High-volume, simple tasks |
| o3 | $2.00 | $0.50 | $8.00 | Deep step-by-step reasoning |
| o3-mini | $1.10 | $0.55 | $4.40 | Cheaper reasoning, coding |
| o1 | $15.00 | $7.50 | $60.00 | Frontier reasoning (legacy) |
Rates are per 1M tokens, USD, standard tier. OpenAI revises pricing frequently โ o-series rates in particular have fallen sharply since launch. Always confirm against the live pricing page before committing a budget.
What OpenAI API Pricing Looks Like Per Real Task
A token is roughly 0.75 words, so 1,000 tokens is about 750 words. Most real requests are small โ a few hundred tokens in, a few hundred out. The table below prices five common workloads on the model you'd actually pick, assuming standard (uncached) input.
Classify a support ticket
GPT-4o mini ยท 500 in / 50 out
Summarize a 10-page PDF
GPT-4o ยท 8K in / 600 out
Draft a marketing email
GPT-4o ยท 800 in / 500 out
Answer a hard math/logic question
o3 ยท 1K in / 12K out*
Generate a 400-line code module
GPT-5 ยท 3K in / 6K out
*The o3 example assumes ~12K hidden reasoning tokens billed at the output rate โ the single most underestimated line item in any reasoning-model budget.
The lesson: individual calls are cheap, but volume compounds fast. A product doing 5 million GPT-4o summaries a month at $0.026 each is spending $130,000 โ and almost all of that is avoidable with the right routing and caching, covered below. For how this rolls up across the industry, see our AI Spending dashboard.
Why the o3 and o1 Reasoning Models Cost More
Reasoning models think before they answer, and that thinking is made of tokens you pay for. A GPT-4o answer might be 400 output tokens. The same prompt on o3 can generate 8,000โ20,000 internal reasoning tokens plus the visible answer โ all billed at the $8-per-million output rate. That's why o3 at a lower headline price than o1 can still cost 10โ15x more than GPT-4o for the same question.
Hidden reasoning tokens
5Kโ20K tokens per answer, billed at output rate, invisible until the bill arrives
Longer outputs
Reasoning answers run 3โ5x longer than a comparable GPT-4o response
Retry sensitivity
A failed or truncated reasoning chain still bills for every token generated
Context reuse
Without caching, long reasoning prompts re-bill full input on every call
The practical rule: don't send a task to o3 unless GPT-4o or GPT-5 demonstrably fails it. Reasoning models are a scalpel, not a default. For roughly 80% of production calls, GPT-4o mini or GPT-4o is both cheaper and fast enough.
OpenAI API Pricing vs Anthropic and Google in 2026
OpenAI isn't priced in a vacuum. Anthropic's Claude and Google's Gemini set the competitive floor, and on a per-token basis the frontier tiers have largely converged near $1.25โ$3 input. Here's how the flagship and budget tiers line up.
| Model | Provider | Input / 1M | Output / 1M |
|---|---|---|---|
| GPT-5 | OpenAI | $1.25 | $10.00 |
| GPT-4o | OpenAI | $2.50 | $10.00 |
| GPT-4o mini | OpenAI | $0.15 | $0.60 |
| Claude Opus 4 | Anthropic | $15.00 | $75.00 |
| Claude Sonnet 4 | Anthropic | $3.00 | $15.00 |
| Claude Haiku 4 | Anthropic | $0.80 | $4.00 |
| Gemini 2.5 Pro | $1.25 | $10.00 | |
| Gemini 2.5 Flash | $0.30 | $2.50 |
The takeaway: OpenAI is competitive-to-cheap at the flagship tier (GPT-5 matches Gemini 2.5 Pro and undercuts Claude Opus 4 by more than 10x), and GPT-4o mini is the cheapest credible general-purpose model at $0.15 input. Anthropic charges a premium for Opus-class quality. For the full competitive picture, see our OpenAI vs Anthropic enterprise breakdown.
Four Levers That Cut an OpenAI API Bill 50โ70%
Do this
- โ Prompt caching: up to 90% off repeated input tokens
- โ Batch API: 50% off for non-urgent jobs (24h window)
- โ Route simple tasks to GPT-4o mini (17x cheaper input)
- โ Trim system prompts โ they bill on every single call
Stop doing this
- โ Defaulting every call to o3 or o1 reasoning models
- โ Sending uncached 20K-token system prompts each request
- โ Paying real-time rates for overnight batch work
- โ Ignoring max_tokens caps on runaway outputs
Stacked together, these are not marginal. A team caching a 15K-token system prompt across 2 million calls, batching its analytics jobs, and downshifting classification to GPT-4o mini routinely takes a $40,000 monthly bill under $14,000 โ same product, same quality. The single highest-ROI change is usually prompt caching, because most production apps resend the same instructions on every request.
The per-token number on the pricing page is not your bill.
Routing, caching, and batching decide whether GPT-4o costs you $14K or $40K a month โ and that gap is entirely an engineering choice.
Track AI model economics and provider spending on the AI Valuations and AI Spending dashboards at Value Add VC. Originally published in the Trace Cohen newsletter. Pricing figures are 2026 estimates and change frequently โ confirm against OpenAI's live pricing page.