The hyperscaler vs colocation vs edge debate is no longer theoretical β it's where hundreds of millions in AI compute budgets are being allocated right now.
The answer depends entirely on workload type, latency tolerance, and cost structure. Training, inference, and edge are not interchangeable β each has a natural habitat. Getting this wrong costs real money. Getting it right can cut compute bills by more than half.
Here's where AI workloads are actually running in 2026, broken down by use case, cost, and architecture decision.
Hyperscalers vs Colocation vs Edge: The Architecture Split
| Workload Type | Default Location | Approx. H100 Cost/hr | Why |
|---|---|---|---|
| Large model training | Hyperscaler (AWS/Azure/GCP) | $5β7 (spot), $8β12 (on-demand) | Burst capacity, managed checkpointing, no hardware risk |
| Sustained inference (high volume) | Colocation / dedicated GPU cloud | $2.50β3.50 fully loaded | 40β60% cost savings vs cloud at predictable load |
| Low-latency inference (<10ms) | On-prem or edge node | CapEx amortized | Latency and data sovereignty requirements |
| Fine-tuning / experimentation | Hyperscaler (spot) | $2β5 (spot) | On-demand, no commitment, fast iteration |
| Automotive / robotics inference | Edge hardware (NVIDIA Orin) | SoC CapEx | No network dependency, sub-5ms required |
Hyperscalers: Still the Default for AI Training
AWS, Azure, and GCP collectively control roughly 67% of the cloud infrastructure market. For AI training, their dominance is even higher β because training is a bursty, variable workload that requires on-demand access to hundreds or thousands of GPUs simultaneously.
AWS leads with ~32% market share and the deepest H100/H200 availability. Azure's partnership with OpenAI gives it exclusive early access to the latest OpenAI models via Azure OpenAI Service β a unique enterprise distribution advantage. GCP's TPU v5 clusters remain the only practical option for training at the scale of Gemini-class models.
AWS
Market share: ~32%
GPU depth, SageMaker, Bedrock
Model access: Titan, Anthropic via Bedrock
Azure
Market share: ~23%
Enterprise integration, OpenAI partnership
Model access: GPT-4o via Azure OpenAI
GCP
Market share: ~12%
TPU access, Vertex AI, Gemini-native
Model access: Gemini 2.5 Pro
The hyperscaler advantage for training is not just hardware availability β it's the managed tooling stack. Distributed training across thousands of GPUs requires checkpointing, fault tolerance, and network fabric optimization that's genuinely hard to replicate in colocation without significant engineering overhead.
Colocation Is Eating AI Inference
Once a model is trained, inference is a completely different cost problem. Inference is predictable, sustained, and cost-sensitive. Running a token generation workload at consistent load 24/7 on hyperscaler on-demand pricing is dramatically more expensive than buying or leasing equivalent GPU capacity in colocation.
The math is stark: an H100 SXM5 at AWS on-demand pricing runs $8β12/hour. The same GPU in a well-run colocation environment β including power, cooling, networking, and amortized hardware β lands at $2.50β3.50/hour all-in. At $1M/month in inference compute, that difference is $400Kβ500K in savings per month.
CoreWeave
GPU-native cloud, built for AI inference
Lowest on-demand H100 pricing in dedicated GPU cloud, InfiniBand fabric
Lambda Labs
Reserved GPU clusters, on-demand burst
Strong ML researcher community, competitive monthly pricing
Crusoe Energy
Stranded energy datacenter GPU compute
Below-market power costs using flared gas; carbon argument for ESG LPs
Equinix / Digital Realty
Traditional colo for bring-your-own hardware
Best colocation interconnect; enterprises buying H200s and colocating directly
OpenAI, Anthropic, and Mistral all run significant portions of their inference infrastructure outside of public hyperscaler clouds β either in dedicated GPU cloud (CoreWeave has been reported as a major OpenAI vendor) or in owned infrastructure. The economics at their scale make anything else untenable.
Edge AI: Real, But Narrower Than the Hype
Edge AI means running inference on-device or at local infrastructure β with no dependency on a cloud round-trip. It's not competing with hyperscalers for the same workloads. It addresses a different problem: what happens when network latency is physically incompatible with the application requirement.
The primary edge AI use cases in 2026 are: autonomous vehicles (NVIDIA Orin chips, sub-5ms required), industrial robotics and quality inspection, on-device LLMs in smartphones (Apple Intelligence, Samsung AI), real-time video surveillance and analysis, and on-prem enterprise deployments where data cannot leave the building.
Edge AI Is Right When:
- β Latency requirement is <5ms (network RTT alone >5ms)
- β Data sovereignty prevents cloud transmission
- β Offline operation required (vehicles, manufacturing floor)
- β Bandwidth costs exceed compute cost at scale (video streams)
Edge AI Is Wrong When:
- β Model size exceeds on-device memory (any 70B+ parameter model)
- β You need rapid model updates without physical access
- β Use case tolerates 50β200ms cloud roundtrip latency
- β Hardware CapEx is prohibitive vs compute-as-service pricing
The Hybrid Architecture Most AI Companies Are Actually Running
In practice, virtually every serious AI company in 2026 runs a hybrid architecture. The pattern that has emerged:
Experimentation & Training
Hyperscaler spot GPU (AWS P5, Azure NDv5, GCP A3)
No commitment, maximum flexibility, preemptible for 60β80% discount
Production Training (recurring)
Hyperscaler reserved instances or dedicated GPU cloud
1-year commits cut on-demand by 30β40%; CoreWeave offers competitive alternatives
High-volume Inference
Colocation or dedicated GPU cloud
40β60% cost savings vs on-demand cloud at sustained load
Latency-Critical Inference
On-prem or regional edge nodes
Sub-10ms requirement, data locality, or regulatory constraint
Development & Fine-tuning
Hyperscaler (managed Jupyter/SageMaker)
Developer productivity tooling; not cost-sensitive at low utilization
Power Constraints Are Reshaping the Map
The deeper constraint behind all three deployment models is power. Hyperscalers have the advantage of reserving large power blocks β Microsoft, Google, and Amazon have each signed 100β500MW campus deals for new AI data center construction. Individual colocation customers rarely get more than 5β20MW per deployment.
This power dynamic matters for long-term infrastructure strategy. As H200 and Blackwell clusters consume 700Wβ1,000W per GPU, a 1,000-GPU cluster draws 700kWβ1MW just on compute β before accounting for cooling (typically 1.2β1.5x overhead). At scale, only hyperscalers and the largest dedicated GPU cloud operators have the power commitments to support clusters above 1,000 GPUs.
Track the full picture of AI infrastructure capex and power demand on the AI Spending Dashboard at Value Add VC.
The infrastructure decision is not a values statement about cloud vs. on-prem.
It's a unit economics decision. Train in the cloud. Infer in colo. Deploy at the edge only when latency demands it.
Track AI infrastructure spending and hyperscaler capex data on the AI Spending Dashboard at Value Add VC. Originally published in the Trace Cohen newsletter.