OpenAI has unveiled Jalapeño, its first custom AI chip, designed in partnership with Broadcom and built specifically to run AI models at scale -- the inference workloads that power ChatGPT and its API -- rather than to train them. According to OpenAI, early silicon is already showing 'significantly better performance-per-watt than current state-of-the-art alternatives,' the company's most tangible move yet to lessen its reliance on Nvidia's GPUs.
The chip is the product of a partnership Broadcom and OpenAI first announced in October 2025, when the two committed to co-developing custom accelerators. OpenAI president Greg Brockman framed the effort as workload-specific: 'We have a deep understanding of the workload. We've really been looking for specific workloads that are underserved, [and asking] how can we build something that will be able to accelerate what's possible?' Notably, OpenAI says its own models helped design the chip -- AI building the hardware that will run AI.
The strategic logic is about economics, not bragging rights. Training a frontier model is a periodic, capital-intensive event; inference is the relentless, every-query cost that scales with usage and never stops. By owning the inference silicon, OpenAI attacks the single largest line in its long-run compute budget -- and the one most exposed to Nvidia's pricing power. Intensive pre-training is expected to keep running on Nvidia for now.
“The chip is the product of a partnership Broadcom and OpenAI first announced in October 2025, when the two committed to co-developing custom accelerators.”
OpenAI is following a path Google and Amazon blazed years ago. Google's TPUs and Amazon's Trainium/Inferentia chips, also built with Broadcom and Marvell as ASIC partners, proved that hyperscalers can design narrow, efficient accelerators that beat general-purpose GPUs on cost-per-token for their own workloads. Meta's MTIA and Microsoft's Maia program are the same bet. Jalapeño puts the most-watched AI company in the world firmly into that club.
The competitive ripples are immediate. Broadcom has emerged as the quiet kingmaker of custom AI silicon, and a marquee OpenAI chip cements the thesis that has driven its stock to historic highs. For Nvidia, every hyperscaler ASIC chips away at the most lucrative corner of its franchise -- not in training, where its lead is durable, but in high-volume inference, where 'good enough and far cheaper' is a winning pitch. The same week, Qualcomm agreed to buy Modular for ~$3.9B to attack Nvidia's CUDA software moat, underscoring a coordinated, industry-wide assault on Nvidia's dominance.
For founders and operators, the message is that compute is becoming a stack to be optimized, not a single vendor to be obeyed. If OpenAI can shave double-digit percentages off inference cost-per-token, it widens its margin to subsidize cheaper API pricing and squeeze rivals who still pay full Nvidia freight. That dynamic flows straight to anyone building on top of these APIs.
The bear case is real: custom silicon is brutally hard, schedules slip, and 'in testing' is not 'in production.' Google spent the better part of a decade maturing TPUs; OpenAI is attempting this while also building data centers, raising tens of billions, and racing to an IPO. Yield, packaging, and the software stack to actually use the chip are where ASIC dreams go to die.
What to watch: a production timeline and volume, whether OpenAI discloses real cost-per-token improvements, how much Nvidia capacity it can actually displace, and whether Jalapeño stays internal or -- like TPUs -- eventually gets rented to others. If it ships at scale, it reprices the entire inference market.