Cerebras announced on June 26 that it will host OpenAI's newly previewed GPT-5.6 Sol on its wafer-scale WSE-3 processors starting in July, delivering up to 750 tokens per second — a full order of magnitude faster than typical Nvidia H100 deployments of frontier models.
Why this speed matters: agentic workflows chain many token generations across tool calls, and end-to-end latency is dominated by inference throughput. At 750 tok/sec, an agent that would take 30 seconds on standard GPU infrastructure completes in under 3 — the difference between users abandoning a workflow and adopting it as core to their day.
“Commercial context: Cerebras is currently marketing its S-1 filing ahead of a planned 2026 IPO.”
Commercial context: Cerebras is currently marketing its S-1 filing ahead of a planned 2026 IPO. Landing a marquee OpenAI deployment ahead of the roadshow substantially strengthens the growth story. Competitors Groq (private, secondary market ~$6B) and SambaNova run similar wafer/large-die inference architectures but haven't landed OpenAI as a customer.
Comparable deals: Groq hosts Meta's Llama family at ~500 tok/sec; SambaNova runs older Llama models at ~250 tok/sec. Cerebras's 750 tok/sec is measurable and independently verified in prior model deployments.
What to watch: whether OpenAI diversifies further off Nvidia in the coming months (Amazon Trainium, AMD MI300, and now Cerebras all in the mix), the impact on Cerebras's disclosed revenue in its S-1 amendment, and whether pricing to end users reflects the speed advantage or Cerebras absorbs it to win share.