AI & TechnologyMay 29, 2026ยท9 min readยทLast updated: May 29, 2026

Amazon AWS AI Investment: $100B+ in Capex and the Race to Power AI Workloads

Amazon is outspending every other hyperscaler on AI infrastructure in 2025. The $100B+ commitment is not a bet on the future โ€” it is a defensive moat being poured right now, and the gap between AWS and everyone else is widening.

TC
Trace Cohen
3x founder, 65+ investments, building Value Add VC

Quick Answer

Amazon AWS AI capex for 2025 exceeded $100B โ€” more than Microsoft ($80B), Google ($75B), or Meta ($65B). The majority funds Trainium2 custom chip clusters, new AWS regions, and the infrastructure behind Bedrock, SageMaker, and Amazon Q. AWS remains the dominant AI cloud platform by workload count and revenue, generating $108B in 2024 revenue with 17% YoY growth.

Amazon committed over $100B in capital expenditures for 2025 โ€” the largest infrastructure bet of any public company in history, and the number keeps moving up.

When Andy Jassy told analysts in February 2025 that capex would "meaningfully exceed" 2024's $83B, he was signaling something structural, not opportunistic. AWS is not spending to keep up. It is spending to widen a moat that took 20 years to build.

Amazon AWS AI Capex in 2025: Where Every Dollar Goes

Amazon does not break out AWS capex from the total, but analysts and disclosed project data paint a clear picture. The $100B+ flows into three categories:

~30%

Custom Silicon

Trainium2 chip manufacturing, Inferentia2 inference processors, Graviton4 for general compute. Amazon is aggressively reducing NVIDIA dependency.

~45%

Data Center Construction & Land

New hyperscale campuses in Virginia, Oregon, Ohio, Ireland, Singapore, Japan. Amazon is adding 6+ new AWS regions in 2025 alone.

~25%

Networking & Power Infrastructure

Ultra-low latency fabric between GPU clusters, dedicated fiber routes, backup power investments including nuclear and solar PPAs.

Project Rainier: The Largest AI Compute Cluster Ever Announced

In late 2024, Amazon and Anthropic revealed Project Rainier โ€” a cluster of 400,000 Trainium2 chips dedicated to training Anthropic's next-generation Claude models. To put that in context: a typical hyperscale AI training cluster runs 10,000โ€“30,000 GPUs. Rainier is 13โ€“40x larger.

This is not Amazon building training infrastructure for its own models. It is Amazon betting that whoever wins the frontier model race will need compute that only AWS can provide at this scale โ€” and locking in Anthropic as an anchor tenant while making the bet.

Amazon has invested over $4B in Anthropic across two tranches. The financial relationship is inseparable from the compute relationship: Anthropic trains on AWS, sells through AWS Bedrock, and gives Amazon first right of refusal on new model access. This is vertical integration masquerading as a partnership.

How Amazon AWS AI Capex Compares to the Other Hyperscalers

Company2024 Capex2025 GuidanceAI % of Revenue
Amazon (AWS)$83B$100B+~17% cloud rev growth
Microsoft (Azure)$56B$80B~33% AI-attributed
Google (GCP)$52B$75B~28% GCP growth
Meta (AI infra)$38B$64โ€“72BN/A (internal use)

Sources: Company earnings calls, Q4 2024 filings. 2025 figures are guidance or analyst consensus as of Q1 2025.

Amazon's Custom Silicon Strategy: The NVIDIA Hedge

Every hyperscaler is running the same play: build custom silicon to reduce dependence on NVIDIA, compress inference costs, and capture the margin that NVIDIA currently takes on H100/H200 GPUs. Amazon is the furthest along.

Trainium2

Model training

4x better perf/dollar vs H100 on transformer workloads

Inferentia2

Model inference at scale

45% lower cost per inference vs GPU alternatives

Graviton4

General compute, CPU-bound tasks

30% better price-performance vs x86 for web/app workloads

Nitro

Hypervisor & security layer

Near-bare-metal performance for EC2 instances

Amazon still buys enormous quantities of NVIDIA H100 and H200 GPUs โ€” customers demand them, and the H200 remains the default for frontier model training. But the custom silicon strategy is working: Bedrock's cheapest inference options today run on Inferentia2, not NVIDIA, and the cost gap is visible in the pricing.

What Amazon's AI Capex Means for Startups and Enterprise Buyers

Every dollar Amazon spends on Trainium2 clusters and data centers is deflationary for AI compute prices โ€” eventually. The path is predictable: Amazon builds massive fixed-cost infrastructure, amortizes it over years, and drops prices to maintain volume. This happened with storage (S3), compute (EC2), and database (RDS). AI inference is next.

What This Unlocks for Founders

  • โœ“ Inference costs falling 40โ€“60% year-over-year on Bedrock
  • โœ“ Bedrock gives access to Claude, Llama, Titan without managing model infra
  • โœ“ SageMaker provides managed fine-tuning, RAG pipelines, and RLHF tooling
  • โœ“ Amazon Q gives enterprises an AI assistant without custom deployment

What Founders Should Watch

  • โš  AWS lock-in is real โ€” switching clouds mid-scale is expensive
  • โš  Amazon builds competing managed services once a category matures
  • โš  Bedrock's model catalog includes your competitors' AI too
  • โš  Reserved instance commitments lock in pricing before you know your usage

The Return Math: Can $100B in Capex Pay Off?

AWS generated $108B in revenue in 2024 at an operating margin above 37% โ€” roughly $40B in operating profit. That is the engine funding the $100B capex cycle. At current growth rates (17% in 2024), AWS reaches $126B in 2025 revenue. If AI accelerates growth to 20โ€“25%, you are looking at $130โ€“135B.

The return math on data center infrastructure works over 10โ€“15 year asset lives. Amazon is not expecting a 2025 return on 2025 capex. It is building physical capacity that takes years to fill and decades to depreciate. The companies that won cloud 1.0 were the ones that committed to infrastructure before demand justified it.

Track the AI infrastructure spending cycle across all hyperscalers on the AI Spending Dashboard and see how it flows through earnings on the Big Tech Earnings tracker at Value Add VC.

The $100B question is not whether Amazon can afford to spend this much.

It is whether anyone else can afford not to โ€” and whether you are building on the right side of this infrastructure bet.

Track big tech AI infrastructure spending on the AI Spending Dashboard at Value Add VC. Originally published in the Trace Cohen newsletter.

Frequently Asked Questions

How much is Amazon spending on AI infrastructure in 2025?

Amazon guided to over $100B in total capex for 2025, the largest of any public company. The majority is directed at AWS AI infrastructure โ€” custom silicon (Trainium2, Inferentia2), data center expansion across 6+ new regions, and the networking fabric needed to run frontier AI workloads at scale.

What is Amazon building with its $100B AI capex?

The spending breaks down into three buckets: custom AI chips (Trainium2 clusters for training, Inferentia2 for inference), new data center campuses in the US, Europe, and Asia, and managed AI services infrastructure including Bedrock, SageMaker, and Amazon Q. Project Rainier โ€” a 400,000-chip Trainium2 cluster โ€” is the largest single AI compute project Amazon has announced.

How does Amazon AWS AI capex compare to Microsoft and Google?

Amazon's $100B+ exceeds Microsoft's $80B and Google's $75B commitments for 2025. The key difference is that Amazon's spend runs through AWS, a standalone profit center generating $108B in 2024 revenue. Microsoft and Google's AI capex is more distributed across consumer and enterprise products, making Amazon's infrastructure bet more capital-concentrated but also more directly monetizable.

What is AWS Trainium2 and why does it matter?

Trainium2 is Amazon's second-generation custom AI training chip, designed to deliver better performance-per-dollar than NVIDIA H100 GPUs for large model training jobs. Amazon claims up to 4x better throughput per dollar on common transformer workloads. This reduces Amazon's dependency on NVIDIA supply chains and gives AWS a cost advantage it can pass to enterprise customers buying reserved compute.

Is Amazon AWS still the dominant AI cloud platform?

Yes. AWS holds roughly 31% of global cloud market share and hosts a disproportionate share of AI workloads because most AI startups were already AWS customers before the AI wave hit. Bedrock, which offers managed access to Claude, Llama, Titan, and other models, had over 10,000 active customers as of late 2024 โ€” a number that has grown rapidly since.

Explore 45+ free VC tools, dashboards, and recommended startup software.