VC
Value Add VC
โšกHomePulseโšกHelpful Apps๐Ÿ“Blog
Home/Blog/OpenAI API Pricing 2026: GPT-4o, o3, and GPT-5 Cost Per Token Breakdown
AI & TechnologyJune 21, 2026ยท11 min readยทLast updated: June 21, 2026

OpenAI API Pricing 2026: GPT-4o, o3, and GPT-5 Cost Per Token Breakdown

Every OpenAI model, what it costs per million tokens in 2026, what a real task actually costs to run, and the four levers that cut an API bill by more than half.

TC
Trace Cohen
Co-Founder & GP at Six Point Ventures ยท 3x founder (BrandYourself, Launch.it, SPOT) ยท 65+ investments ยท Based in Boca Raton, FL
@Trace_Cohenยทt@nyvp.comยทSouth Florida Advisory

Quick Answer

GPT-4o costs $2.50 per million input tokens and $10 per million output in 2026, while GPT-4o mini runs $0.15/$0.60 and the o3 reasoning model lands near $2/$8 after price cuts. GPT-5 sits around $1.25/$10, cheaper than GPT-4o on input despite stronger reasoning and a 400K-token context window.

GPT-4o costs $2.50 per million input tokens and $10 per million output in 2026; GPT-4o mini is $0.15/$0.60, and GPT-5 lands near $1.25/$10. That's the short answer. The longer answer โ€” what a real task actually costs โ€” is more interesting.

Headline per-token rates are the number everyone quotes and the number that matters least. What you actually pay depends on which model you route to, how many reasoning tokens it burns invisibly, whether your input is cached, and whether you run jobs in real time or in batch. Below is the full 2026 price list, then the math that turns those rates into a monthly bill.

OpenAI API Pricing 2026: The Full Per-Token Breakdown

OpenAI API pricing in 2026 spans from $0.15 per million input tokens for GPT-4o mini to $15 per million for the o1 reasoning model. GPT-4o, the default workhorse, costs $2.50 input and $10 output per million tokens. GPT-5 is priced at roughly $1.25 input and $10 output โ€” cheaper on input than the model it replaces. You pay only for tokens processed.

ModelInput / 1MCached input / 1MOutput / 1MBest for
GPT-5$1.25$0.125$10.00Hardest reasoning, long context
GPT-5 mini$0.25$0.025$2.00Cheaper reasoning at scale
GPT-4o$2.50$1.25$10.00General-purpose default
GPT-4o mini$0.15$0.075$0.60High-volume, simple tasks
o3$2.00$0.50$8.00Deep step-by-step reasoning
o3-mini$1.10$0.55$4.40Cheaper reasoning, coding
o1$15.00$7.50$60.00Frontier reasoning (legacy)

Rates are per 1M tokens, USD, standard tier. OpenAI revises pricing frequently โ€” o-series rates in particular have fallen sharply since launch. Always confirm against the live pricing page before committing a budget.

What OpenAI API Pricing Looks Like Per Real Task

A token is roughly 0.75 words, so 1,000 tokens is about 750 words. Most real requests are small โ€” a few hundred tokens in, a few hundred out. The table below prices five common workloads on the model you'd actually pick, assuming standard (uncached) input.

Classify a support ticket

GPT-4o mini ยท 500 in / 50 out

$0.00011

Summarize a 10-page PDF

GPT-4o ยท 8K in / 600 out

$0.026

Draft a marketing email

GPT-4o ยท 800 in / 500 out

$0.007

Answer a hard math/logic question

o3 ยท 1K in / 12K out*

$0.098

Generate a 400-line code module

GPT-5 ยท 3K in / 6K out

$0.064

*The o3 example assumes ~12K hidden reasoning tokens billed at the output rate โ€” the single most underestimated line item in any reasoning-model budget.

The lesson: individual calls are cheap, but volume compounds fast. A product doing 5 million GPT-4o summaries a month at $0.026 each is spending $130,000 โ€” and almost all of that is avoidable with the right routing and caching, covered below. For how this rolls up across the industry, see our AI Spending dashboard.

Why the o3 and o1 Reasoning Models Cost More

Reasoning models think before they answer, and that thinking is made of tokens you pay for. A GPT-4o answer might be 400 output tokens. The same prompt on o3 can generate 8,000โ€“20,000 internal reasoning tokens plus the visible answer โ€” all billed at the $8-per-million output rate. That's why o3 at a lower headline price than o1 can still cost 10โ€“15x more than GPT-4o for the same question.

Hidden reasoning tokens

5Kโ€“20K tokens per answer, billed at output rate, invisible until the bill arrives

Longer outputs

Reasoning answers run 3โ€“5x longer than a comparable GPT-4o response

Retry sensitivity

A failed or truncated reasoning chain still bills for every token generated

Context reuse

Without caching, long reasoning prompts re-bill full input on every call

The practical rule: don't send a task to o3 unless GPT-4o or GPT-5 demonstrably fails it. Reasoning models are a scalpel, not a default. For roughly 80% of production calls, GPT-4o mini or GPT-4o is both cheaper and fast enough.

OpenAI API Pricing vs Anthropic and Google in 2026

OpenAI isn't priced in a vacuum. Anthropic's Claude and Google's Gemini set the competitive floor, and on a per-token basis the frontier tiers have largely converged near $1.25โ€“$3 input. Here's how the flagship and budget tiers line up.

ModelProviderInput / 1MOutput / 1M
GPT-5OpenAI$1.25$10.00
GPT-4oOpenAI$2.50$10.00
GPT-4o miniOpenAI$0.15$0.60
Claude Opus 4Anthropic$15.00$75.00
Claude Sonnet 4Anthropic$3.00$15.00
Claude Haiku 4Anthropic$0.80$4.00
Gemini 2.5 ProGoogle$1.25$10.00
Gemini 2.5 FlashGoogle$0.30$2.50

The takeaway: OpenAI is competitive-to-cheap at the flagship tier (GPT-5 matches Gemini 2.5 Pro and undercuts Claude Opus 4 by more than 10x), and GPT-4o mini is the cheapest credible general-purpose model at $0.15 input. Anthropic charges a premium for Opus-class quality. For the full competitive picture, see our OpenAI vs Anthropic enterprise breakdown.

Four Levers That Cut an OpenAI API Bill 50โ€“70%

Do this

  • โœ“ Prompt caching: up to 90% off repeated input tokens
  • โœ“ Batch API: 50% off for non-urgent jobs (24h window)
  • โœ“ Route simple tasks to GPT-4o mini (17x cheaper input)
  • โœ“ Trim system prompts โ€” they bill on every single call

Stop doing this

  • โœ• Defaulting every call to o3 or o1 reasoning models
  • โœ• Sending uncached 20K-token system prompts each request
  • โœ• Paying real-time rates for overnight batch work
  • โœ• Ignoring max_tokens caps on runaway outputs

Stacked together, these are not marginal. A team caching a 15K-token system prompt across 2 million calls, batching its analytics jobs, and downshifting classification to GPT-4o mini routinely takes a $40,000 monthly bill under $14,000 โ€” same product, same quality. The single highest-ROI change is usually prompt caching, because most production apps resend the same instructions on every request.

The per-token number on the pricing page is not your bill.

Routing, caching, and batching decide whether GPT-4o costs you $14K or $40K a month โ€” and that gap is entirely an engineering choice.

Track AI model economics and provider spending on the AI Valuations and AI Spending dashboards at Value Add VC. Originally published in the Trace Cohen newsletter. Pricing figures are 2026 estimates and change frequently โ€” confirm against OpenAI's live pricing page.

ShareXLinkedInEmail

Frequently Asked Questions

How much does the OpenAI API cost in 2026?

OpenAI API pricing in 2026 ranges from $0.15 per million input tokens for GPT-4o mini to $15 per million for the o1 reasoning model. The workhorse GPT-4o costs $2.50 input and $10 output per million tokens, while GPT-5 is priced around $1.25 input and $10 output. You only pay for tokens actually processed, billed per request.

How much does GPT-5 cost per token compared to GPT-4o?

GPT-5 costs roughly $1.25 per million input tokens and $10 per million output tokens in 2026, which makes its input price 50% cheaper than GPT-4o's $2.50 while output stays the same at $10. Cached input on GPT-5 drops to about $0.125 per million, so high-reuse workloads see the biggest savings versus GPT-4o.

Why is the o3 reasoning model more expensive than GPT-4o?

The o3 reasoning model costs about $2 input and $8 output per million tokens, but the real cost driver is hidden reasoning tokens. A single o3 answer can burn 5,000 to 20,000 internal reasoning tokens you pay for at the output rate, so a task that costs $0.01 on GPT-4o can cost $0.15 or more on o3 despite similar headline pricing.

How can I reduce my OpenAI API bill?

The four biggest levers are prompt caching (up to 90% off repeated input tokens), the Batch API (50% off for non-urgent jobs), routing simple tasks to GPT-4o mini instead of GPT-4o, and trimming system prompts. Combined, these routinely cut a production API bill by 50โ€“70% without changing output quality.

Is GPT-4o mini cheap enough for high-volume apps?

Yes. At $0.15 per million input and $0.60 per million output tokens, GPT-4o mini is roughly 17x cheaper than GPT-4o on input and powers most high-volume classification, extraction, and routing workloads. A million short classification calls of about 500 tokens each costs under $80 on GPT-4o mini versus well over $1,300 on GPT-4o.

Related Tools & Dashboards

๐Ÿค–AI Valuations๐Ÿ’ธAI Spending๐Ÿ’นBig Tech Earnings

Keep Reading

๐Ÿ†OpenAI vs Anthropic: Which AI Company Is Winning the Enterprise in 2026?๐Ÿ’ผAnthropic's Business Model: How the AI Safety Company Makes Money๐ŸขOpenAI Enterprise Revenue 2026: Contracts, Seats, and ARR

Explore 45+ free VC tools, dashboards, and recommended startup software.

Explore DashboardsHelpful Apps & Platforms

Trace Cohen is a serial founder, investor and data geek. Please feel free to reach out t@nyvp.com

VC
Value Add VC
Helpful AppsTwitterContact