OpenAI Deep Research turns a single prompt into a 10-to-30-page cited report in 5 to 30 minutes — work that used to take a junior analyst a full day or two. That's the short answer. The longer answer is why it's the first AI feature that replaces a task, not just a keystroke.
OpenAI shipped Deep Research on February 2, 2025, and it landed differently from the chatbot updates that came before it. A normal ChatGPT reply answers in seconds from what the model already knows. Deep Research does something closer to what an analyst does: it plans a research approach, opens and reads dozens of web pages, revises its plan as it learns, and then writes a structured report with inline citations. You don't watch it type — you give it a brief and come back to a finished document.
What the OpenAI Deep Research Feature Actually Is
The OpenAI Deep Research feature is an agentic mode inside ChatGPT, launched February 2, 2025, that autonomously browses the web for 5 to 30 minutes, reads dozens of sources, and writes a structured, fully cited report. It runs on a version of OpenAI's o3 reasoning model and is built for multi-step research, not quick chat answers.
The mechanical difference matters. Standard ChatGPT is reactive — one prompt, one answer. Deep Research is a loop: it decomposes your question, searches, reads, notices gaps, searches again, and only stops when it has enough to write. A single run can touch 30 to 100+ web pages and cite dozens of them inline, so you can click through and check the source behind any claim. OpenAI later added a faster, lighter version powered by an o4-mini-class model to handle simpler queries and raise usage limits, but the headline product is the slow, thorough one you delegate real work to.
OpenAI Deep Research vs Gemini, Perplexity, and the Field
"Deep research" became a category in 2025, and by 2026 every major AI lab ships a version. They cluster on the same idea — an agent that researches for minutes and returns a cited report — but differ sharply on depth, speed, and price. Here is how OpenAI Deep Research stacks up against the tools people compare it to most.
| Tool | Underlying Model | Humanity's Last Exam | Time / Report | Entry Price | Best At |
|---|---|---|---|---|---|
| OpenAI Deep Research | o3 (deep research) | ~26.6% | 5–30 min | $20/mo | Deepest, best-structured long reports |
| Google Gemini Deep Research | Gemini 2.5 Pro | ~mid-20s% | 5–15 min | $20/mo (One AI) | Breadth + Google ecosystem integration |
| Perplexity Deep Research | Multi-model | ~21% | 1–3 min | Free / $20 Pro | Speed and a usable free tier |
| xAI Grok DeepSearch | Grok 4 | ~25–38%* | 2–10 min | $30/mo | Real-time X / social signal |
| Anthropic Claude Research | Claude Opus 4.x | ~mid-20s% | 3–15 min | $20/mo | Careful reasoning, fewer hallucinated cites |
| GPT-Researcher (open source) | Bring-your-own | n/a | Varies | Free + API | Self-hosted, scriptable pipelines |
Figures are mid-2026 estimates blended from vendor announcements (OpenAI, Google, Perplexity, xAI, Anthropic) and the Humanity's Last Exam public results. *Grok scores vary widely with tool use enabled vs disabled. Benchmark numbers shift by harness and scaffolding; entry price reflects the lowest paid tier that unlocks deep research.
The honest read: OpenAI Deep Research wins on depth and structure, Perplexity wins on speed and price, and Gemini wins if you already live in Google Docs and Sheets. None of them "win" outright — serious users run two and route by task. For an investor mapping a sector, OpenAI's thoroughness is worth the wait; for a quick fact-check, Perplexity's sub-3-minute turnaround usually beats it.
OpenAI Deep Research Pricing: What $20 vs $200 a Month Buys
OpenAI Deep Research has no standalone price — it is bundled into ChatGPT subscriptions, and the tier you pick controls how many reports you can run per month rather than which features you unlock. The expensive plans buy volume and priority, not a different product.
| Plan | Price | Deep Research / Month | Who It's For |
|---|---|---|---|
| ChatGPT Free | $0 | Limited lite version | Trying it out occasionally |
| ChatGPT Plus | $20/mo | ~25 full runs | Individuals, regular research |
| ChatGPT Team | $25/user/mo | ~25 + shared workspace | Small teams with admin needs |
| ChatGPT Pro | $200/mo | ~250 full runs | Power users, daily reports |
| ChatGPT Enterprise | Custom | High / negotiated | Orgs needing SSO + controls |
| Deep Research API | Per-token | Usage-based | Embedding research in products |
Figures are mid-2026 estimates based on OpenAI's published ChatGPT pricing and stated deep research usage allowances, which OpenAI has revised upward several times since the February 2025 launch. Lite (o4-mini-class) runs are counted separately and have higher limits at every tier.
The math is what makes this disruptive. A single junior analyst costs $80,000 to $120,000 in base salary and well over $100,000 fully loaded. The $200 Pro plan — about $2,400 a year — runs roughly 250 deep reports a month, or 3,000 a year. Even if you throw away half the output as not-good-enough, the cost per usable research artifact lands in single-digit dollars. That gap is exactly why operators are rethinking how many bodies a research function needs, a shift we track across our AI Valuations dashboard.
How Good Is It, Really: Benchmarks and the Hallucination Problem
The benchmark that made people pay attention was Humanity's Last Exam — roughly 3,000 expert-level questions across more than 100 subjects, designed to be brutally hard. OpenAI Deep Research scored about 26.6%, against 3.3% for GPT-4o and around 9% for o1 without browsing. An 8x jump over GPT-4o is real progress.
GPT-4o
3.3%
No browsing/tools
o1
~9%
Reasoning, no web
Deep Research
26.6%
Agentic web research
Human expert
~90%+
Domain specialist
But read 26.6% the other way: it still gets roughly three out of four hard questions wrong. And the failure mode is dangerous because it is confident. Deep Research can write a fluent, well-cited paragraph where the citation only loosely supports the claim — or where a real source is summarized slightly wrong. OpenAI's own launch notes flagged that it can still hallucinate facts and struggle to distinguish authoritative sources from rumor. The practical rule is simple: treat every report as a strong first draft from a fast, tireless researcher who occasionally makes things up. You verify before you act.
Who Should Use the OpenAI Deep Research Feature — and Who Shouldn't
Deep Research shines on questions that are wide, public, and synthesizable: market landscapes, competitive teardowns, literature reviews, regulatory summaries, and first-draft investment memos. For a VC, "map every funded company building AI inference chips, with funding, backers, and a one-line thesis on each" is a near-perfect prompt — exactly the kind of grind that used to eat an analyst's afternoon. The more specific your brief and the more the answer lives in public text, the better it does.
It is a poor fit where the answer isn't on the open web. Private market data behind paywalls, proprietary datasets, anything requiring original primary interviews, or fast-moving stories where the truth changed an hour ago — these expose its limits. It also can't be accountable: it won't get fired for a wrong number, won't sit in the partner meeting, and won't know which of two conflicting sources your firm trusts. That judgment layer is still yours.
The real shift for founders and operators is what it does to the org chart. When one prompt produces what took a junior a day, the bottleneck moves from gathering information to specifying questions well and verifying answers fast. That doesn't zero out the analyst role — it raises the floor on it, the same way coding agents raised the floor on engineers. We dig into how that capital efficiency is re-rating software margins on the AI Spending dashboard.
OpenAI Deep Research isn't a smarter search box.
It's a research analyst you delegate to — at $20 to $200 a month instead of a $100K+ salary.
It scores 26.6% on the hardest benchmark we have, writes in minutes what took a person a day, and still gets enough wrong that a human has to check it. The teams winning with it aren't the ones who trust it blindly — they're the ones who learned to ask precise questions and verify the answers fast.
Explore Related Dashboards
Interactive tools with live data on this topic
Track frontier AI models, agent economics, and valuations across OpenAI, Anthropic, Google, and xAI on the AI Valuations Dashboard and AI Spending Dashboard at Value Add VC. Originally published in the Trace Cohen newsletter.