What is the Meta Llama release date and what models were included?

Meta released Llama on April 5, 2025. The release included three models: Llama Scout (17B active parameters, 10 million token context window, 16-expert MoE), Llama Maverick (17B active parameters, 128-expert MoE), and Llama Behemoth (approximately 2 trillion total parameters, still in training at launch). All three use a Mixture of Experts architecture.

How does Llama Maverick compare to GPT-4 and Claude Sonnet?

At launch, Llama Maverick scored 1417 on the LM Arena leaderboard, placing above GPT-4 (1411) and Gemini 2.0 Flash Experimental. It performs strongly on coding, math, and multilingual tasks. Claude Sonnet 4.6 and GPT-5 have since raised the bar, but Maverick remains competitive at a fraction of the API cost — or free when self-hosted.

Can companies use Llama commercially?

Yes, with restrictions. Meta's Llama license permits commercial use but requires attribution and prohibits using the model to train competing foundation models. Companies with more than 700 million monthly active users must request a special commercial license from Meta. For most startups and enterprises, the license is effectively open for commercial deployment.

Why does open-weight AI leadership matter for the AI market?

Open-weight frontier models commoditize the base layer of AI. When a model matching closed-source quality is freely downloadable, it eliminates the moat that API pricing creates for OpenAI and Anthropic. Enterprises can self-host, fine-tune on proprietary data, and control their cost structure. This forces closed providers to compete on ecosystem, speed, and reliability rather than raw capability alone.

What is the Llama Scout 10 million token context window used for?

Llama Scout's 10 million token context window — the largest of any production model at its launch — enables processing of entire codebases, legal document libraries, or years of customer conversations in a single prompt. Practical enterprise use cases include full-repository code review, large-scale document analysis, and long-form agentic workflows that previously required complex retrieval-augmented generation pipelines.

Meta Llama: Open-Weight Leadership in AI Market

The Meta Llama release answered a question the AI industry had been circling for two years: can open-weight models reach frontier quality? The answer is yes — and the implications go far beyond benchmark scores.

Scout runs a 10 million token context window, the largest of any production model at launch. Maverick hit 1417 on LM Arena, beating GPT-4. Behemoth, at roughly 2 trillion total parameters, was still training when Meta shipped the others. This wasn't an "open source catch-up" story. It was Meta establishing capability parity while making the models free to download and self-host.

What the Llama Models Actually Are

All three Llama models use a Mixture of Experts (MoE) architecture — meaning only a fraction of total parameters activate per token, making them far more compute-efficient than dense models of similar capability.

Model	Active Params	Total Params	Context Window	Experts
Llama Scout	17B	109B	10M tokens	16
Llama Maverick	17B	400B	1M tokens	128
Llama Behemoth	~288B	~2T	TBD	16

Source: Meta AI research blog, April 2025. Behemoth specifications are approximate; model was still in training at launch.

Why the Benchmark Numbers Matter (and Where They Don't)

Maverick's 1417 LM Arena score placed it above GPT-4 at launch — a remarkable result for an open-weight model. On MMLU-Pro (expert-level reasoning), Maverick scored 80.5% versus GPT-4's 74.4%. On the MATH benchmark, it hit 93.2%. These are not cherry-picked edge cases. This is a generalist model matching closed frontier quality.

The caveat: benchmarks are a floor, not a ceiling. Llama Maverick's real-world coding performance lagged Claude Sonnet and GPT-4 on complex multi-file tasks in internal testing across several engineering teams I've spoken with. Instruction following and long-horizon agent reliability are areas where closed providers still have an edge in production. The gap is narrowing — but it exists.

Strong at launch

—MMLU-Pro: 80.5% (vs GPT-4 74.4%)
—MATH: 93.2%
—Multilingual benchmarks
—LM Arena rank: 1417 (beat GPT-4)

Still trails closed models

—Complex multi-file code generation
—Long-horizon agent reliability
—Instruction following edge cases
—Enterprise SLA / uptime guarantees

What the Meta Llama Release Does to the AI Market

Open-weight frontier models don't just compete — they change the economics of the entire market. When you can download Maverick and self-host it for $0.00 per token, the conversation about OpenAI's $15/M input pricing for GPT-4 becomes unavoidable for any procurement team.

Enterprises

Self-hosting Llama Scout or Maverick on AWS/GCP/Azure can cut AI inference costs 60–80% vs closed API pricing for high-volume workloads. Privacy-sensitive industries (healthcare, finance, legal) get to keep data on-premises without sacrificing capability. The procurement calculus has fundamentally changed.

AI Startups Building on APIs

If your product is a layer on top of a closed model and Llama can do the same job for free, you have a business model problem. The companies that survive this are the ones with proprietary data, vertical workflows, or distribution that extends beyond raw model access. Commodity capability is now the floor, not the ceiling.

OpenAI & Anthropic

Llama doesn't kill them — it forces them to compete on ecosystem, reliability, and frontier-exclusive capabilities (reasoning models, deep tool use, agent frameworks). The $300B+ combined valuation of these labs rests on the assumption they can maintain capability leads. Open-weight parity compresses that window. See our AI Valuations dashboard for where these companies are priced today.

The 10 Million Token Context Window Is a Product Decision, Not a Tech Demo

Scout's 10M-token context window is larger than anything GPT-4, Claude Sonnet, or Gemini Pro offered at Llama's launch. To put it in practical terms: 10 million tokens is roughly 7,500 pages of text, or an entire mid-sized codebase, or years of customer support transcripts. In a single prompt.

The traditional answer to "how do you work with more context than a model supports" was RAG — retrieval-augmented generation, where you chunk documents and retrieve relevant pieces. RAG works but introduces retrieval errors, chunking tradeoffs, and engineering complexity. A 10M-token context window doesn't eliminate RAG for all use cases, but it makes brute-force approaches viable for a much larger set of enterprise problems.

This matters especially for legal tech, healthcare documentation, financial analysis, and code review — verticals where the entire corpus needs to be in-context for accurate reasoning. Companies building in these spaces should be actively evaluating Scout-based architectures against their RAG implementations.

Behemoth and What Comes Next

Llama Behemoth — approximately 2 trillion total parameters — was still training when Meta shipped Scout and Maverick. Meta positioned it as a "teacher model" used to improve the smaller models through distillation, while also being available as a standalone frontier model for the most demanding workloads.

At 2T parameters, Behemoth is in the same weight class as the models that power GPT-4 and Claude Opus. If it delivers proportional performance gains over Maverick and ships with a permissive enough license, it becomes the most capable open-weight model in history by a wide margin. That's a meaningful milestone for the market — not because of what Behemoth does on benchmarks, but because of what it signals about Meta's long-term commitment to keeping the frontier open.

The pattern here is clear. Meta releases Llama 1 in February 2023, Llama 2 in July 2023, Llama 3 in April 2024, and Llama in April 2025. Each generation narrows the gap with closed models. At this pace, the question for Llama 5 isn't whether it will match the frontier — it's whether it will lead it. Track how these dynamics are playing out in AI startup valuations and big tech earnings.

What This Means for Founders and Investors

Opportunities Created

✓ Fine-tuning businesses on open-weight models with proprietary data
✓ Self-hosted AI infrastructure for regulated industries
✓ Long-context applications that were previously cost-prohibitive
✓ Open-weight model tooling (serving, evals, monitoring, fine-tune pipelines)
✓ Vertical AI companies with proprietary training data

Businesses Under Pressure

✕ API wrappers with no data or workflow differentiation
✕ RAG-only businesses where long-context eliminates the problem
✕ Model providers without a clear capability lead narrative
✕ Enterprise AI tools built on high-cost closed APIs with no switching cost
✕ Anything priced on "we're using GPT-4" as a feature

Meta is not building an AI company. It's building a world where no one else can build a moat on top of an AI model alone.

The companies that win the AI era won't be the ones with the best base model — they'll be the ones with the best data, distribution, and workflow ownership.

Track AI company valuations and the open vs. closed model competition on the AI Valuations Dashboard at Value Add VC. Originally published in the Trace Cohen newsletter.

Get VC data most people never see — free.

Weekly benchmarks, valuations, and fund data. No spam, unsubscribe anytime.

What the Llama Models Actually Are

Model	Active Params	Total Params	Context Window	Experts
Llama Scout	17B	109B	10M tokens	16
Llama Maverick	17B	400B	1M tokens	128
Llama Behemoth	~288B	~2T	TBD	16

Source: Meta AI research blog, April 2025. Behemoth specifications are approximate; model was still in training at launch.

Why the Benchmark Numbers Matter (and Where They Don't)

Strong at launch

—MMLU-Pro: 80.5% (vs GPT-4 74.4%)
—MATH: 93.2%
—Multilingual benchmarks
—LM Arena rank: 1417 (beat GPT-4)

Still trails closed models

—Complex multi-file code generation
—Long-horizon agent reliability
—Instruction following edge cases
—Enterprise SLA / uptime guarantees

What the Meta Llama Release Does to the AI Market

Enterprises

AI Startups Building on APIs

OpenAI & Anthropic

The 10 Million Token Context Window Is a Product Decision, Not a Tech Demo

Behemoth and What Comes Next

What This Means for Founders and Investors

Opportunities Created

✓ Fine-tuning businesses on open-weight models with proprietary data
✓ Self-hosted AI infrastructure for regulated industries
✓ Long-context applications that were previously cost-prohibitive
✓ Open-weight model tooling (serving, evals, monitoring, fine-tune pipelines)
✓ Vertical AI companies with proprietary training data

Businesses Under Pressure

✕ API wrappers with no data or workflow differentiation
✕ RAG-only businesses where long-context eliminates the problem
✕ Model providers without a clear capability lead narrative
✕ Enterprise AI tools built on high-cost closed APIs with no switching cost
✕ Anything priced on "we're using GPT-4" as a feature

Meta is not building an AI company. It's building a world where no one else can build a moat on top of an AI model alone.

The companies that win the AI era won't be the ones with the best base model — they'll be the ones with the best data, distribution, and workflow ownership.

Track AI company valuations and the open vs. closed model competition on the AI Valuations Dashboard at Value Add VC. Originally published in the Trace Cohen newsletter.

Get VC data most people never see — free.

Weekly benchmarks, valuations, and fund data. No spam, unsubscribe anytime.

Meta Llama Release: What Open-Weight Model Leadership Means for the AI Market

What the Llama Models Actually Are

Why the Benchmark Numbers Matter (and Where They Don't)

What the Meta Llama Release Does to the AI Market

The 10 Million Token Context Window Is a Product Decision, Not a Tech Demo

Behemoth and What Comes Next

What This Means for Founders and Investors

Frequently Asked Questions

Keep Reading

Meta Llama Release: What Open-Weight Model Leadership Means for the AI Market

What the Llama Models Actually Are

Why the Benchmark Numbers Matter (and Where They Don't)

What the Meta Llama Release Does to the AI Market

The 10 Million Token Context Window Is a Product Decision, Not a Tech Demo

Behemoth and What Comes Next

What This Means for Founders and Investors

Frequently Asked Questions

Keep Reading