Groq has raised $650 million in a financing led by Infinitum and Disruptive, the company's latest infusion as it scales an AI inference business built on its own custom silicon. Groq designs Language Processing Units (LPUs), chips purpose-built for running -- not training -- AI models with very low latency, and pairs them with a cloud service that sells that speed directly to developers and enterprises.
The company's pitch is vertical integration: by owning both the chip and the cloud that runs it, Groq argues it can deliver tokens faster and cheaper than competitors stitching together Nvidia GPUs. Founded by Jonathan Ross, a former architect of Google's TPU, Groq has leaned into inference specifically -- the high-volume, latency-sensitive workload that is becoming the dominant share of AI compute as applications move from demos into production.
“It came the same week Baseten closed a $1.5 billion round at a $13 billion valuation, making two large inference bets in days.”
The raise lands amid feverish investor appetite for the inference layer. It came the same week Baseten closed a $1.5 billion round at a $13 billion valuation, making two large inference bets in days. Together they signal that capital sees the serving of AI models -- regardless of which lab's model wins -- as a structurally growing market. Groq competes with Nvidia's GPU stack, the hyperscalers' inference offerings, and software-layer players like Together AI and Fireworks.
The broader context is the custom-silicon wave reshaping AI hardware. As OpenAI, Google, Amazon and others build their own accelerators to escape Nvidia's margins, Groq is the merchant version of that thesis -- offering specialized inference silicon to anyone, not just hyperscalers with the scale to design their own. Its edge has to be raw performance-per-dollar on real workloads.
The bear case is formidable competition and capital intensity: building chips and a cloud simultaneously is enormously expensive, and Groq must out-execute both Nvidia's ecosystem and well-funded software rivals. What to watch: Groq's deployed capacity and customer wins, independent benchmarks of its LPU performance versus GPUs, and whether the inference market consolidates around a few platforms or stays fragmented enough for a specialist to thrive.