Cerebras Runs OpenAI GPT-5.6 Sol at 750 Tokens per Second, Setting a New Frontier-Model Speed Record

Cerebras Systems will deploy OpenAI's GPT-5.6 Sol on its wafer-scale WSE-3 chips at up to 750 tokens per second in July — roughly 10x faster than any Nvidia GPU deployment of a frontier model in production. The partnership is a strategic proof point for Cerebras ahead of its planned 2026 IPO.

Up to 750 tok/sec

Peak Speed

GPT-5.6 Sol (flagship)

Model

July 2026

Deployment Month

~70 tok/sec typical

Comparable Nvidia H100

Cerebras WSE-3

Chip Platform

Trace Cohen

Early-stage VC & angel · Founder, New York Venture Partners

June 26, 2026

1 min read

Cerebras announced on June 26 that it will host OpenAI's newly previewed GPT-5.6 Sol on its wafer-scale WSE-3 processors starting in July, delivering up to 750 tokens per second — a full order of magnitude faster than typical Nvidia H100 deployments of frontier models.

Why this speed matters: agentic workflows chain many token generations across tool calls, and end-to-end latency is dominated by inference throughput. At 750 tok/sec, an agent that would take 30 seconds on standard GPU infrastructure completes in under 3 — the difference between users abandoning a workflow and adopting it as core to their day.

“Commercial context: Cerebras is currently marketing its S-1 filing ahead of a planned 2026 IPO.”

Commercial context: Cerebras is currently marketing its S-1 filing ahead of a planned 2026 IPO. Landing a marquee OpenAI deployment ahead of the roadshow substantially strengthens the growth story. Competitors Groq (private, secondary market ~$6B) and SambaNova run similar wafer/large-die inference architectures but haven't landed OpenAI as a customer.

Comparable deals: Groq hosts Meta's Llama family at ~500 tok/sec; SambaNova runs older Llama models at ~250 tok/sec. Cerebras's 750 tok/sec is measurable and independently verified in prior model deployments.

What to watch: whether OpenAI diversifies further off Nvidia in the coming months (Amazon Trainium, AMD MI300, and now Cerebras all in the mix), the impact on Cerebras's disclosed revenue in its S-1 amendment, and whether pricing to end users reflects the speed advantage or Cerebras absorbs it to win share.

Cerebras Runs OpenAI GPT-5.6 Sol at 750 Tokens per Second, Setting a New Frontier-Model Speed Record

Read Next

OpenAI Previews GPT-5.6 Sol, Terra and Luna — Restricted to Trusted Partners at Washington's Request

Adobe Acquires Topaz Labs to Bring Best-in-Class AI Image Enhancement Native to Creative Cloud

Onsemi to Acquire Synaptics for $7B All-Stock Deal, Expanding Edge AI and Robotics Silicon

Cerebras Runs OpenAI GPT-5.6 Sol at 750 Tokens per Second, Setting a New Frontier-Model Speed Record

Read Next

OpenAI Previews GPT-5.6 Sol, Terra and Luna — Restricted to Trusted Partners at Washington's Request

Adobe Acquires Topaz Labs to Bring Best-in-Class AI Image Enhancement Native to Creative Cloud

Onsemi to Acquire Synaptics for $7B All-Stock Deal, Expanding Edge AI and Robotics Silicon