Nvidia's data-center GPUs carry gross margins above 70% β and that one number is why Google, Amazon, and Microsoft each spent billions designing their own AI chips.
That's the short answer. The longer answer is more interesting, because building a competitive AI accelerator is one of the hardest engineering problems in the world, and three of the richest companies on earth decided it was still cheaper than renting Nvidia's forever. They aren't trying to beat Nvidia on raw speed. They're trying to own the cost curve underneath the AI boom β and at over $270B of combined 2025 capex, even a few points of margin is real money.
Why Google, Amazon, and Microsoft are building their own AI chips
Google, Amazon, and Microsoft build custom AI chips β Google's TPU, Amazon's Trainium and Inferentia, and Microsoft's Maia β to escape Nvidia's 70%-plus GPU gross margins, lower cost per token by roughly 30β40% on suited workloads, and reduce reliance on a single supplier they cannot buy enough from. Custom silicon is a margin and supply-chain strategy first, a performance strategy second.
The logic is brutal and simple. When AWS or Azure buys an Nvidia H200, roughly 70 cents of every dollar is Nvidia profit. Multiply that across hundreds of thousands of chips and the leakage runs into the tens of billions. Designing your own accelerator is a multi-year, multi-billion-dollar bet, but if it works it converts a permanent tax into an asset you control. The hyperscalers are effectively paying a large fixed cost up front to delete a much larger variable cost forever.
There's a second motive that doesn't show up in margin math: supply. Through 2024 and 2025, Nvidia's best chips were rationed, and even the largest customers waited in line. Owning a chip program means never being fully at the mercy of one vendor's allocation list. Track the full spending picture on the AI Spending Dashboard.
Custom AI chips compared: Google TPU, Amazon Trainium, and Microsoft Maia
The three custom AI chip programs from Google, Amazon, and Microsoft are at very different stages of maturity. Google has a decade head start; Amazon has the most aggressive customer anchor; Microsoft is the newest entrant playing catch-up. Here is how the silicon lines up.
| Company | Chip | First shipped | Status (2026) | Anchor workload |
|---|---|---|---|---|
| TPU v7 (Ironwood) | 2015 (v1) | 7th-gen, in production | Gemini, Google Cloud | |
| Amazon | Trainium3 | 2020 (Trn1) | Trn2 volume, Trn3 ramping | Anthropic / Bedrock |
| Amazon | Inferentia2 | 2019 (Inf1) | Generally available | AWS inference serving |
| Microsoft | Maia 100 | 2023 | Deployed, Maia 200 next | OpenAI, Copilot inference |
| Meta | MTIA v2 | 2023 | Internal production | Ranking, recommendations |
| Nvidia | GB200 / H200 | 2022 (H100) | Industry standard | Everyone, via CUDA |
Generations and dates are approximate and blend public disclosures with reporting. The pattern is what matters: Google is years ahead, and everyone else is racing to close the gap. See the head-to-head in AI Hardware Wars: Nvidia vs AMD vs Google TPU.
Google's TPU: the ten-year head start
Google shipped its first Tensor Processing Unit in 2015, years before generative AI was a market. By 2026 it is on its seventh-plus generation, and TPUs train and serve Gemini end to end β meaning Google is the one hyperscaler that can run a frontier model without buying a single Nvidia GPU if it chooses to.
That maturity shows up commercially. Google rents TPU capacity through Google Cloud, and outside customers β including, at various points, Apple and Anthropic β have trained large models on TPU pods. The latest generation, codenamed Ironwood, is built specifically for inference at scale, with pods wiring together thousands of chips over a custom optical interconnect. The strategic edge isn't any single chip; it's that Google has been compounding hardware-software co-design for a decade while competitors were still drawing schematics.
The catch is software. Nvidia's CUDA is the default for nearly every AI researcher on earth, and TPUs require Google's JAX/XLA stack. That friction is why TPUs dominate inside Google and remain a niche choice outside it β a recurring theme for every custom chip.
Amazon's Trainium and Microsoft's Maia: buying their way to relevance
Amazon's answer to Google's head start is a customer. The roughly $8B Amazon committed to Anthropic comes with a condition that matters more than the equity: Anthropic trains on Trainium at enormous scale, through a cluster known as Project Rainier reported to involve several hundred thousand Trainium2 chips. That gives AWS a frontier-model customer forcing its silicon to actually work, and Trainium3 is now ramping behind it. Amazon claims Trainium2 delivers 30β40% better price-performance than comparable GPU instances on suited workloads.
Microsoft is the newest of the three. Maia 100, unveiled in late 2023, targets inference for the workloads Microsoft knows best β OpenAI's models and Copilot. With Microsoft guiding to $80B+ in capex and OpenAI consuming staggering amounts of compute, even shifting a slice of inference onto Maia changes the economics. But Maia is a generation or two behind TPU and Trainium in deployment, and Microsoft still leans heavily on Nvidia to serve OpenAI at the scale ChatGPT demands.
The common thread across both: a custom chip is worthless without a captive, high-volume workload to justify it. Amazon rented one from Anthropic; Microsoft has one in OpenAI; Google grew its own in Gemini. See how the model layer is priced in our breakdown of AI company valuations.
Custom AI chips vs Nvidia: where each actually wins
The honest framing isn't custom silicon versus Nvidia as a winner-take-all fight. It's a division of labor that's settling in. Here is where each side has the edge in 2026.
| Dimension | Custom silicon (TPU/Trainium/Maia) | Nvidia GPUs |
|---|---|---|
| Raw peak performance | Trails GB200 on most benchmarks | Industry leader |
| Cost per token (inference) | 30β40% lower on suited workloads | Premium pricing, ~70%+ margin |
| Software ecosystem | JAX/XLA, Neuron SDK β niche | CUDA β universal default |
| Flexibility / new architectures | Optimized for known workloads | Handles anything researchers throw at it |
| Supply control | Owned in-house, no allocation list | Rationed, vendor-controlled |
| Margin captured by buyer | Buyer keeps it | Nvidia keeps ~70%+ |
| Best fit | High-volume, stable inference | Frontier training, R&D, breadth |
The pattern is clear: custom chips win on the boring, enormous, repetitive workload β inference β where cost per token compounds across billions of requests. Nvidia keeps the frontier, the research labs, and anyone who needs maximum flexibility or can't rewrite their stack for a new chip. Both can be true at once, which is why the hyperscalers keep buying Nvidia by the billions even as they ship their own. Compare the spend in our Big Tech Earnings Dashboard.
The bull and bear case on custom AI silicon
Bull Case
- β Clawing back Nvidia's 70%+ margin on owned workloads is enormous at $270B+ scale
- β Inference volume β the biggest, most repetitive workload β fits custom chips perfectly
- β Owning silicon ends dependence on Nvidia's allocation list
- β Google's decade of TPU proves the model works at frontier scale
- β Captive customers (Anthropic, OpenAI, Gemini) anchor real demand
Bear Case
- β CUDA lock-in keeps external customers on Nvidia regardless of price
- β Chip programs cost billions and take years before they pay back
- β Microsoft's Maia and Amazon's Trainium are unproven outside captive workloads
- β Nvidia's annual release cadence keeps moving the performance target
- β Fast-changing model architectures can strand chips optimized for old ones
Nvidia sells the picks. The hyperscalers want the mine.
Custom AI chips aren't about beating Nvidia on speed β they're about deleting Nvidia's 70% margin from the most expensive line item in computing.
Track real-time AI infrastructure and capex data on the AI Spending Dashboard, AI Valuations, and Big Tech Earnings Dashboard at Value Add VC. See also: Amazon AWS AI Capex 2025 and AI Hardware Wars.