92% of AI training workloads ran on Nvidia GPUs in Q1 2026 — but AMD MI300X is shipping at one-third the price with 2.4x the memory, and Google TPU v5p just hit 459 TFLOPs per chip. The chip war is no longer just about FLOPs.
That's the short answer. The longer answer is that the gap between Nvidia and everyone else is closing on memory, on price-per-token of inference, and on hyperscaler willingness to deploy non-CUDA silicon. The H200 still wins for training a frontier foundation model from scratch. For everything else — inference, fine-tuning, vertical AI — the math has changed.
AI Hardware Wars: Nvidia H200 vs AMD MI300 vs Google TPU v5 in 2026
Nvidia H200 is still the default training accelerator in 2026, with 141GB HBM3e, 4.8TB/s memory bandwidth, and an effective monopoly on the CUDA software stack that the largest labs are built on. AMD MI300X is the only credible alternative on price-per-memory, shipping at $10K-$15K with 192GB HBM3. Google TPU v5p is competitive on training cost inside Google Cloud only — it has no on-prem footprint and no third-party sale. Across the full $300B-plus 2025-2026 AI capex cycle, Nvidia captured roughly 75-80% of the spend.
| Spec | Nvidia H200 | Nvidia B200 | AMD MI300X | Google TPU v5p |
|---|---|---|---|---|
| Memory | 141GB HBM3e | 192GB HBM3e | 192GB HBM3 | 95GB HBM2e |
| Memory bandwidth | 4.8 TB/s | 8.0 TB/s | 5.3 TB/s | 2.8 TB/s |
| FP16/BF16 TFLOPs | 1,979 | 4,500 | 1,307 | 459 |
| Street price (per unit) | $30K-$40K | $40K-$50K | $10K-$15K | N/A (cloud only) |
| Cloud rental ($/hr) | $3.00-$10.00 | $6.00-$12.00 | $1.99-$4.50 | $2.00-$4.20 |
| Software stack | CUDA (mature) | CUDA (mature) | ROCm 6.x | JAX/XLA |
| Lead time | 8-16 weeks | 12-24 weeks | 4-8 weeks | Cloud only |
| Power per chip | 700W | 1,000W | 750W | 350W |
The AI Chip Market Share Picture in 2026
Nvidia's data center revenue hit $115B in fiscal 2025 — up from $47B in fiscal 2024 — and is tracking toward $180B-$200B in fiscal 2026. That single line item is roughly 4x the entire prior AMD data center business and 12x the AMD Instinct line. But the share trajectory tells a different story than the absolute dollars.
92%
Nvidia
Down from 95%+ in 2023
4-5%
AMD
Meta, MSFT, Oracle wins
2%
Google TPU
First-party only
1-2%
AWS / Microsoft
Trainium2, Maia
AMD's MI300X has secured material commitments: Microsoft is running Bing and Copilot inference on tens of thousands of MI300X units, Meta is running 40%+ of Llama 4 inference on AMD, and Oracle Cloud added 30,000+ MI300X chips in 2025. Google's TPU v5p quietly powers most internal Gemini training while Anthropic — Google's biggest external customer — runs Claude inference on a mixed TPU and Nvidia fleet under a multi-year cloud commit worth over $4B.
Total Cost of Ownership: H200 vs MI300X vs TPU v5p
Sticker price hides the real spread. A 1,024-GPU training cluster running for 12 months has dramatically different economics depending on which platform you pick — once you add networking, software engineering, and power. Here is what a frontier-lab-scale build actually costs in 2026.
| Cost line | H200 cluster | MI300X cluster | TPU v5p pod |
|---|---|---|---|
| 1,024 chips (capex) | $36M ($35K avg) | $13M ($12.5K avg) | Cloud only |
| Networking (InfiniBand/RoCE) | $4M | $4M | Included |
| Power per year (1,024 × 700-1000W) | $2.1M | $2.3M | Included |
| Datacenter buildout (per MW) | $10M-$12M | $10M-$12M | Included |
| Software engineering (1 year) | $3M-$5M | $8M-$15M | $5M-$8M |
| Reserved cloud equivalent (12mo) | $53M-$72M | $24M-$36M | $30M-$50M |
MI300X looks 60% cheaper at the chip line but loses 30-40% of that gap to ROCm porting and slower kernel optimization. For a startup with under $10M of compute spend, the breakeven is rarely there. For Meta — which spent $39B on Llama 4 training and inference — saving even 25% of capex is worth absorbing $200M of software engineering cost. That is the actual reason MI300X is winning at hyperscale and losing at startup scale.
Training vs Inference: Why the Hardware Question Splits in 2026
The 2024 conversation was almost entirely about training FLOPs. In 2026 the inference market is roughly 60% of total AI compute spend — OpenAI alone is reportedly burning $7B-$9B per year on inference for ChatGPT, and Anthropic Claude is spending $3B-$4B. That shift completely changes the chip math.
Where Nvidia still dominates
- • Frontier model pretraining (1T+ params)
- • Multi-node training with NVLink/InfiniBand
- • Reinforcement learning fine-tuning
- • Any workload that needs CUDA + TensorRT
- • Research labs at OpenAI, Anthropic, xAI
Where AMD and TPU are gaining
- • High-volume inference (MI300X 192GB shines)
- • Single-node fine-tuning of open models
- • Cost-sensitive batch inference
- • First-party clouds with controlled stack
- • Anything Meta, Microsoft, or Google runs
AI Hardware Pricing in 2026: What Each GPU Actually Costs Today
Cloud rental is now where the chip war is actually fought. Here are 2026 reserved-pricing benchmarks on AWS, Azure, GCP, and the neoclouds (CoreWeave, Lambda, Crusoe, Voltage Park), pulled from public listing prices and enterprise quotes.
| Provider | H100 ($/hr) | H200 ($/hr) | B200 ($/hr) | MI300X ($/hr) |
|---|---|---|---|---|
| AWS (on-demand) | $5.20 | $7.95 | $11.50 | $4.10 |
| Azure (on-demand) | $4.90 | $7.40 | $10.80 | N/A |
| GCP (on-demand) | $5.50 | $8.20 | $12.00 | N/A |
| CoreWeave (reserved) | $2.49 | $3.50 | $5.95 | $2.10 |
| Lambda (reserved) | $2.49 | $3.29 | $5.49 | $1.99 |
| Crusoe (reserved) | $2.45 | $3.40 | $5.85 | $1.95 |
The neoclouds are pricing H100 at $2.49/hr against AWS at $5.20/hr — a 52% discount. The gap is bigger on H200. CoreWeave, Lambda, and Crusoe collectively burned through $30B+ of GPU capex in 2024-2025 and need utilization, which is why every spot price war runs through them first. For startups, the play is almost always: lock 70% reserved on a neocloud, top up with hyperscaler on-demand for spikes.
Who's Winning the AI Hardware War: The Honest Call
Nvidia is winning revenue and margin. Gross margin sat at 75%+ through fiscal 2025, and the H200/B200/GB200 product cadence is keeping the next 24 months booked. But three trend lines bend against Nvidia between now and 2028:
Hyperscaler internal silicon
Google TPU v5p, AWS Trainium2, Microsoft Maia 100, and Meta MTIA collectively absorbed roughly 8% of internal AI workloads in 2025 — projected to hit 18-22% by 2027. That is share Nvidia structurally cannot recover, because the buyer also builds.
AMD ROCm catching up
ROCm 6.x finally hit performance parity with CUDA for transformer inference in late 2025. The 2-year software gap is closing to roughly 9-12 months. Once it hits parity, the $20K-$25K per-chip price gap becomes purely a function of buyer switching cost.
Inference moving to commodity
Inference-only chips from Groq, Cerebras, SambaNova, and d-Matrix are pricing tokens 5-10x cheaper than Nvidia for specific model shapes. None will replace Nvidia for training, but each chips away at the 60% of compute that is inference.
Nvidia wins the 2026 AI hardware war on revenue, margin, and training workloads.
But the 92% share is the peak. AMD, hyperscaler silicon, and inference specialists are about to pull 15-20 points away by 2028 — and that is where the actual investable trade lives.
Track AI capex and chip vendor revenue on the AI Spending Dashboard and AI Chip Wars at Value Add VC. Originally published in the Trace Cohen newsletter.