VC
Value Add VC
⚑HomePulse⚑Helpful AppsπŸ“Blog
Home/Blog/Custom AI Chips: Why Google, Amazon, and Microsoft Are Building Their Own Silicon
AI & TechnologyJune 22, 2026Β·10 min readΒ·Last updated: June 22, 2026

Custom AI Chips: Why Google, Amazon, and Microsoft Are Building Their Own Silicon

Nvidia's data-center GPUs carry gross margins north of 70%. That single number explains why the three largest cloud providers spent years and billions designing their own AI chips β€” Google's TPU, Amazon's Trainium, and Microsoft's Maia β€” instead of just writing bigger checks.

TC
Trace Cohen
Co-Founder & GP at Six Point Ventures Β· 3x founder (BrandYourself, Launch.it, SPOT) Β· 65+ investments Β· Based in Boca Raton, FL
@Trace_CohenΒ·t@nyvp.comΒ·South Florida Advisory

Quick Answer

Three custom AI chips β€” Google's TPU v7, Amazon's Trainium3, and Microsoft's Maia 100 β€” anchor over $270B in combined 2025 capex, built to dodge Nvidia's 70%+ GPU margins. Custom silicon can cut cost-per-workload 30–40%, which is how hyperscalers defend cloud profitability at trillion-dollar AI scale.

Nvidia's data-center GPUs carry gross margins above 70% β€” and that one number is why Google, Amazon, and Microsoft each spent billions designing their own AI chips.

That's the short answer. The longer answer is more interesting, because building a competitive AI accelerator is one of the hardest engineering problems in the world, and three of the richest companies on earth decided it was still cheaper than renting Nvidia's forever. They aren't trying to beat Nvidia on raw speed. They're trying to own the cost curve underneath the AI boom β€” and at over $270B of combined 2025 capex, even a few points of margin is real money.

Why Google, Amazon, and Microsoft are building their own AI chips

Google, Amazon, and Microsoft build custom AI chips β€” Google's TPU, Amazon's Trainium and Inferentia, and Microsoft's Maia β€” to escape Nvidia's 70%-plus GPU gross margins, lower cost per token by roughly 30–40% on suited workloads, and reduce reliance on a single supplier they cannot buy enough from. Custom silicon is a margin and supply-chain strategy first, a performance strategy second.

The logic is brutal and simple. When AWS or Azure buys an Nvidia H200, roughly 70 cents of every dollar is Nvidia profit. Multiply that across hundreds of thousands of chips and the leakage runs into the tens of billions. Designing your own accelerator is a multi-year, multi-billion-dollar bet, but if it works it converts a permanent tax into an asset you control. The hyperscalers are effectively paying a large fixed cost up front to delete a much larger variable cost forever.

There's a second motive that doesn't show up in margin math: supply. Through 2024 and 2025, Nvidia's best chips were rationed, and even the largest customers waited in line. Owning a chip program means never being fully at the mercy of one vendor's allocation list. Track the full spending picture on the AI Spending Dashboard.

Custom AI chips compared: Google TPU, Amazon Trainium, and Microsoft Maia

The three custom AI chip programs from Google, Amazon, and Microsoft are at very different stages of maturity. Google has a decade head start; Amazon has the most aggressive customer anchor; Microsoft is the newest entrant playing catch-up. Here is how the silicon lines up.

CompanyChipFirst shippedStatus (2026)Anchor workload
GoogleTPU v7 (Ironwood)2015 (v1)7th-gen, in productionGemini, Google Cloud
AmazonTrainium32020 (Trn1)Trn2 volume, Trn3 rampingAnthropic / Bedrock
AmazonInferentia22019 (Inf1)Generally availableAWS inference serving
MicrosoftMaia 1002023Deployed, Maia 200 nextOpenAI, Copilot inference
MetaMTIA v22023Internal productionRanking, recommendations
NvidiaGB200 / H2002022 (H100)Industry standardEveryone, via CUDA

Generations and dates are approximate and blend public disclosures with reporting. The pattern is what matters: Google is years ahead, and everyone else is racing to close the gap. See the head-to-head in AI Hardware Wars: Nvidia vs AMD vs Google TPU.

Google's TPU: the ten-year head start

Google shipped its first Tensor Processing Unit in 2015, years before generative AI was a market. By 2026 it is on its seventh-plus generation, and TPUs train and serve Gemini end to end β€” meaning Google is the one hyperscaler that can run a frontier model without buying a single Nvidia GPU if it chooses to.

That maturity shows up commercially. Google rents TPU capacity through Google Cloud, and outside customers β€” including, at various points, Apple and Anthropic β€” have trained large models on TPU pods. The latest generation, codenamed Ironwood, is built specifically for inference at scale, with pods wiring together thousands of chips over a custom optical interconnect. The strategic edge isn't any single chip; it's that Google has been compounding hardware-software co-design for a decade while competitors were still drawing schematics.

The catch is software. Nvidia's CUDA is the default for nearly every AI researcher on earth, and TPUs require Google's JAX/XLA stack. That friction is why TPUs dominate inside Google and remain a niche choice outside it β€” a recurring theme for every custom chip.

Amazon's Trainium and Microsoft's Maia: buying their way to relevance

Amazon's answer to Google's head start is a customer. The roughly $8B Amazon committed to Anthropic comes with a condition that matters more than the equity: Anthropic trains on Trainium at enormous scale, through a cluster known as Project Rainier reported to involve several hundred thousand Trainium2 chips. That gives AWS a frontier-model customer forcing its silicon to actually work, and Trainium3 is now ramping behind it. Amazon claims Trainium2 delivers 30–40% better price-performance than comparable GPU instances on suited workloads.

Microsoft is the newest of the three. Maia 100, unveiled in late 2023, targets inference for the workloads Microsoft knows best β€” OpenAI's models and Copilot. With Microsoft guiding to $80B+ in capex and OpenAI consuming staggering amounts of compute, even shifting a slice of inference onto Maia changes the economics. But Maia is a generation or two behind TPU and Trainium in deployment, and Microsoft still leans heavily on Nvidia to serve OpenAI at the scale ChatGPT demands.

The common thread across both: a custom chip is worthless without a captive, high-volume workload to justify it. Amazon rented one from Anthropic; Microsoft has one in OpenAI; Google grew its own in Gemini. See how the model layer is priced in our breakdown of AI company valuations.

Custom AI chips vs Nvidia: where each actually wins

The honest framing isn't custom silicon versus Nvidia as a winner-take-all fight. It's a division of labor that's settling in. Here is where each side has the edge in 2026.

DimensionCustom silicon (TPU/Trainium/Maia)Nvidia GPUs
Raw peak performanceTrails GB200 on most benchmarksIndustry leader
Cost per token (inference)30–40% lower on suited workloadsPremium pricing, ~70%+ margin
Software ecosystemJAX/XLA, Neuron SDK β€” nicheCUDA β€” universal default
Flexibility / new architecturesOptimized for known workloadsHandles anything researchers throw at it
Supply controlOwned in-house, no allocation listRationed, vendor-controlled
Margin captured by buyerBuyer keeps itNvidia keeps ~70%+
Best fitHigh-volume, stable inferenceFrontier training, R&D, breadth

The pattern is clear: custom chips win on the boring, enormous, repetitive workload β€” inference β€” where cost per token compounds across billions of requests. Nvidia keeps the frontier, the research labs, and anyone who needs maximum flexibility or can't rewrite their stack for a new chip. Both can be true at once, which is why the hyperscalers keep buying Nvidia by the billions even as they ship their own. Compare the spend in our Big Tech Earnings Dashboard.

The bull and bear case on custom AI silicon

Bull Case

  • βœ“ Clawing back Nvidia's 70%+ margin on owned workloads is enormous at $270B+ scale
  • βœ“ Inference volume β€” the biggest, most repetitive workload β€” fits custom chips perfectly
  • βœ“ Owning silicon ends dependence on Nvidia's allocation list
  • βœ“ Google's decade of TPU proves the model works at frontier scale
  • βœ“ Captive customers (Anthropic, OpenAI, Gemini) anchor real demand

Bear Case

  • βœ• CUDA lock-in keeps external customers on Nvidia regardless of price
  • βœ• Chip programs cost billions and take years before they pay back
  • βœ• Microsoft's Maia and Amazon's Trainium are unproven outside captive workloads
  • βœ• Nvidia's annual release cadence keeps moving the performance target
  • βœ• Fast-changing model architectures can strand chips optimized for old ones

Nvidia sells the picks. The hyperscalers want the mine.

Custom AI chips aren't about beating Nvidia on speed β€” they're about deleting Nvidia's 70% margin from the most expensive line item in computing.

Track real-time AI infrastructure and capex data on the AI Spending Dashboard, AI Valuations, and Big Tech Earnings Dashboard at Value Add VC. See also: Amazon AWS AI Capex 2025 and AI Hardware Wars.

ShareXLinkedInEmail

Frequently Asked Questions

Why are Google, Amazon, and Microsoft building their own AI chips?

The core reason is margin. Nvidia's data-center GPUs carry gross margins above 70%, so roughly 70 cents of every dollar a cloud provider spends on an H200 is Nvidia profit. By designing TPU, Trainium, and Maia, Google, Amazon, and Microsoft claw back that margin, lower cost per token by 30–40% on suited workloads, and reduce dependence on a single supplier whose chips they cannot get enough of.

What is the difference between Google TPU, AWS Trainium, and Microsoft Maia?

Google's TPU is the most mature, now in its seventh-plus generation (Trillium/Ironwood era) and powering Gemini and Google Cloud. Amazon's Trainium2 is in volume with Trainium3 ramping, anchored by a multi-hundred-thousand-chip Anthropic cluster. Microsoft's Maia 100 is the newest of the three, used internally for OpenAI and Copilot inference. All three target inference cost first, where volume is highest.

Are custom AI chips actually better than Nvidia GPUs?

Usually not on raw performance β€” Nvidia's GB200 still leads on peak throughput and owns the CUDA software ecosystem developers depend on. Custom chips win on total cost of ownership for specific, high-volume workloads. If a hyperscaler can serve inference on its own silicon at 60% of an Nvidia instance's cost, it keeps the margin or passes savings on to win deals, even if the chip is slower.

How much are hyperscalers spending on AI chips and infrastructure in 2026?

The combined 2025 capex of Amazon (~$105B), Google (~$85B), and Microsoft (~$80B+) exceeds $270B, and 2026 guidance points higher. AI compute β€” GPUs plus custom silicon β€” is typically the single largest line, around 40–45% of that spend. Custom chips are a fraction of total volume today but the fastest-growing slice as the providers shift inference off Nvidia.

Does custom silicon threaten Nvidia's business?

Not in the near term. Nvidia still books record data-center revenue and the hyperscalers remain its largest customers, buying H200 and GB200 at scale. The threat is long-term and structural: every major buyer now has an in-house escape hatch, which caps Nvidia's pricing power on inference and commodity training over time. Nvidia's CUDA lock-in and pace of releases are its main defenses.

Related Tools & Dashboards

πŸ’ΈAI SpendingπŸ’ΉBig Tech EarningsπŸ€–AI Valuations

Keep Reading

πŸ€–Amazon AWS AI Capex 2025: $105B and the Custom Chip StrategyπŸ”§AI Hardware Wars: Nvidia vs AMD vs Google TPUπŸ—οΈThe SoftBank–OpenAI Stargate Deal: $500B in AI Infrastructure

Explore 45+ free VC tools, dashboards, and recommended startup software.

Explore DashboardsHelpful Apps & Platforms

Trace Cohen is a serial founder, investor and data geek. Please feel free to reach out t@nyvp.com

VC
Value Add VC
Helpful AppsTwitterContact