Self-hosting your own GPUs only beats managed AI APIs after roughly 5 million requests a day, or about $40,000/month in inference spend. Below that line, buying wins. That's the short answer. The longer answer is more interesting.
I've sat on both sides of this. As a founder I've burned cash on infrastructure we didn't need, and as an investor across 65+ companies I watch portfolio teams agonize over a decision that, nine times out of ten, has an obvious answer they're too proud to accept. Here's the framework I give them.
Build vs Buy AI Infrastructure for a Startup: The Direct Answer
For a startup, buy AI infrastructure โ meaning managed APIs from providers like OpenAI, Anthropic, or Google โ until you exceed roughly 5 million requests per day or $40,000 per month in inference spend. Below that threshold, the fixed cost of GPUs plus the salaries of engineers to run them exceeds what you'd pay per token. Build only when scale, latency under 100ms, or strict data-residency rules make buying impossible.
| Attribute | Buy (Managed APIs) | Build (Self-Hosted GPUs) |
|---|---|---|
| Upfront cost | $0 | $0 (cloud) to $250K+ (owned hardware) |
| Monthly run-rate at low volume | $200โ$5,000 | $30,000โ$40,000 (one 8x H100 node) |
| Per-token cost | $0.50โ$15 / 1M tokens | $0.05โ$0.30 / 1M tokens at full utilization |
| Engineering headcount | 0 dedicated | 1โ2 ML infra engineers ($400K+/yr) |
| Time to first token | 200โ800ms | 50โ150ms (tuned, in-VPC) |
| Time to production | Hours to days | 4โ12 weeks |
| Break-even volume | Wins below ~5M req/day | Wins above ~5M req/day |
| Data residency control | Limited (provider terms) | Full (your VPC) |
The Real Cost of Building AI Infrastructure
Founders see the per-token price of self-hosting โ sometimes 10x cheaper than an API call โ and assume building is the obvious win. It isn't, because that per-token number only materializes at near-100% GPU utilization, which almost no startup achieves.
A reserved 8x H100 node costs roughly $30,000โ$40,000 per month on AWS, GCP, or a neocloud like CoreWeave or Lambda. Spot pricing drops to $2โ$3 per GPU-hour but evaporates mid-job. On top of hardware you need 1โ2 ML infrastructure engineers, each costing $250Kโ$400K fully loaded. Add storage, networking egress, observability, and model-eval pipelines, and a serious self-hosted setup starts around $700Kโ$900K per year before serving a single profitable token.
Then there's utilization. Most startups run inference in bursts โ daytime traffic, batch jobs overnight. If your GPUs sit at 30โ40% utilization, you're paying full freight for idle silicon, and your effective per-token cost balloons past what the API would have charged. The dashboards on our AI spending tracker show even the hyperscalers struggle with utilization at the margin โ and they have data scientists optimizing it full-time.
When Buying AI Infrastructure Wins: The Break-Even Math
Here's the calculation I walk founders through. Say each request averages 1,000 input + 500 output tokens, and you serve a mid-tier model at $1 per million input and $3 per million output tokens. That's about $0.0025 per request on a managed API.
| Daily Requests | Buy: Monthly API Cost | Build: Monthly Cluster Cost |
|---|---|---|
| 100K | ~$7,500 | ~$35,000 |
| 500K | ~$37,500 | ~$35,000 |
| 1M | ~$75,000 | ~$35,000โ$70,000 |
| 5M | ~$375,000 | ~$140,000 (4 nodes) |
| 10M | ~$750,000 | ~$280,000 (8 nodes) |
| 25M | ~$1,875,000 | ~$600,000+ (scaled fleet) |
The crossover sits somewhere between 500K and 1M requests/day on raw cost โ but that ignores the $400K+/year in engineering you need to keep a cluster healthy. Once you fold that in, the honest break-even pushes out to roughly 5M requests/day. Below that, buying isn't just simpler โ it's cheaper.
When to Build AI Infrastructure Anyway
Cost isn't the only axis. There are four situations where I tell founders to build even below the break-even line:
- Hard latency floors. Real-time voice, trading, or robotics use cases that need sub-100ms first-token latency often can't tolerate a third-party API hop. In-VPC inference at 50โ150ms becomes a product requirement, not an optimization.
- Data residency and compliance. Healthcare, defense, and EU-regulated data sometimes legally can't leave your environment. No API SLA fixes a HIPAA or GDPR boundary.
- Deep fine-tuning as moat. If a custom-trained model on proprietary data is your actual differentiation โ not a wrapper around a frontier model โ owning the stack is the business. This is the dividing line I wrote about in AI wrappers vs. AI-native.
- Genuine scale. Above 10M+ requests/day with predictable load, owning capacity at $280K/month beats $750K/month in API fees by a wide enough margin to fund the team.
The Hybrid Path Most Winners Actually Take
The framing of build vs buy AI infrastructure as binary is a trap. The companies I've watched scale well almost always run a hybrid: managed APIs for frontier reasoning and spiky traffic, plus self-hosted open models (Llama 4, Qwen, Mistral) for high-volume, predictable, latency-sensitive paths.
A typical pattern: route 80% of cheap classification and extraction calls to a self-hosted 8B open model at pennies per million tokens, and reserve the expensive frontier API for the 20% of requests that genuinely need it. This caps your blended cost while keeping engineering overhead low. Tools like vLLM, Ollama, and managed inference platforms (Together, Fireworks, Baseten) have collapsed the time-to-production for self-hosting from months to days, which shifts the math โ but not enough to make building the default for a seed-stage team.
The valuation premium for owning real AI infrastructure is real, too โ you can see how the market prices infrastructure-heavy AI companies versus thin wrappers on our AI valuations dashboard. But investors fund the moat, not the GPU bill.
The Bottom Line
Buy until it hurts. For any startup under ~5M requests/day or ~$40K/month in inference, managed APIs are cheaper, faster to ship, and free your engineers to build product. Self-host only when scale, sub-100ms latency, or data-residency rules force your hand โ and even then, run a hybrid. The worst outcome isn't paying an API margin; it's spending $700K/year and two of your best engineers on infrastructure that gives your customers nothing they can feel. Spend on the moat, rent the rest.