For two years, the industry argued about which model was smarter. GPT-4 versus Claude versus Gemini. Benchmark wars. Parameter counts thrown around like batting averages. While everyone debated model size, the companies actually deploying AI at scale discovered the real differentiator: how much a model can hold in its head at once.
The Context Window Explosion Nobody Talks About
In 2020, GPT-3 had a 4,096-token context window โ roughly 3,000 words, or about 10 pages. By early 2026, Google's Gemini 1.5 Pro supports 2 million tokens. That's a 500x increase in three years, and it has gone almost entirely undiscussed in mainstream AI coverage relative to its actual importance.
The progression: GPT-3 at 4K tokens (2020) โ GPT-4 at 8K/32K (2023) โ Claude 2 at 100K (2023) โ Gemini 1.5 Pro at 1M (2024) โ Gemini 2.0 at 2M (2025). Meanwhile, the AI press spent most of that period covering parameter count benchmarks and leaderboard rankings that correlate weakly with real-world enterprise performance.
I've sat in enough enterprise AI pilots to see this pattern clearly: the features that move POCs to production are almost never about raw model intelligence. They're about whether the model can consume the full document, the full conversation history, the full codebase โ without losing the thread.
What Becomes Possible Above 200K Tokens
- โขLoad an entire production codebase (100K+ lines) and debug across files without losing cross-file context
- โขIngest a full 400-page M&A data room plus comparable precedents and reason holistically about risk
- โขAnalyze 12 months of customer support transcripts to surface churn patterns no human analyst would find
- โขProcess a full financial model with every assumption, formula, and scenario visible simultaneously
- โขReview competing expert witness reports in a litigation matter and identify contradictions across thousands of pages
Why Parameters Are the Wrong Benchmark for Enterprise
Model parameters measure raw neural capacity โ the "size of the brain." Context windows measure working memory โ "how much the brain can actively think about at once." For most enterprise workflows, working memory is the binding constraint.
A 7B-parameter model with a 1M-token context window can outperform a 70B-parameter model with a 4K context window on document analysis tasks. The smaller model sees the full picture. The larger model is forced to chunk โ breaking the document into segments, processing each independently, then poorly reassembling answers from partial views. The chunking errors compound. The larger model loses despite its raw power.
This also has cost implications. Larger models cost more per token to run. If a 7B model can do the job with full context, the ROI equation changes dramatically โ especially at enterprise scale where you're processing millions of documents. According to Anthropic's internal benchmarks, Claude 3 Haiku with large context outperformed GPT-4 Turbo on several legal document review tasks at one-tenth the inference cost.
The Investor Angle: Where Context Creates Moats
For founders and investors, context window dynamics create a specific investment lens. Businesses that win in this environment share two traits: they have accumulated large, structured corpora of domain-specific documents, and they have the tooling to feed those documents into model context efficiently at inference time.
Legal tech is the clearest case. A firm that has 20 years of case files, contract templates, and negotiation outcomes doesn't just have training data โ it has a context library. With modern context windows, they can surface relevant precedents in real time by loading the ten most relevant historical documents alongside the current matter. That's not an AI feature. That's a compounding data moat with a 20-year head start.
I've seen this pattern in healthcare AI, financial services compliance, and enterprise sales intelligence. The companies worth backing aren't the ones with the best prompts or the most expensive model subscriptions. They're the ones sitting on proprietary document repositories that become exponentially more valuable as context windows grow. The context window is the unlock โ the data is the moat.
The next phase of AI competition won't be won on leaderboard benchmarks. It will be won by whoever can hold the most relevant context, reason over it reliably, and do it cheaply enough to deploy at enterprise scale.
Stay current with VC and startup trends at Value Add VC. Originally published in the Trace Cohen newsletter.