GPT-5.5's 1M-token context window is the structural unlock that changes litigation discovery workflows. Per OpenAI's GPT-5.5 launch announcement on April 23, 2026, the standard model now accepts up to 1 million input tokens — roughly 750,000 words, or about 3,000 pages of legal document text. A 600-document discovery production fits in a single context. A full deposition transcript across multi-day testimony fits in a single context. A megadoc commercial agreement with all schedules and side letters fits in a single context. The benchmark coverage led with the size; the workflow change is what matters. For litigation associates and discovery vendors, this is the moment chunking-and-retrieval pipelines stop being mandatory.


What 1M tokens actually holds in litigation discovery

1 million tokens is roughly 750,000 words of English legal prose. To anchor that against real workloads:

- A typical mid-size discovery production: 500-2,000 documents, 200-800 pages average each. The 1M context fits the lower-bound production wholesale, the upper-bound production in two passes. - A multi-day deposition transcript: 400-800 pages typical, with exhibits. Fits in 1M context with room for the model's reasoning trace. - A complex commercial agreement: master agreement plus schedules, exhibits, side letters, and prior amendments. Often 200-600 pages total. Fits comfortably. - A regulatory record (FERC, FCC, FDA): can run 5,000-15,000 pages for major proceedings. Doesn't fit; needs chunking. - A full M&A data room: 10,000-50,000 pages typical for mid-market deals. Doesn't fit; needs retrieval.

The operational read: 1M context covers most discovery and deposition workloads natively. Mega-cases and large data rooms still need chunking infrastructure. Per TechCrunch's coverage, latency on 1M-token loads matches GPT-5.4 at smaller contexts — there's no speed tax for using the full window. That part is genuinely new.

What changes in the discovery workflow

Pre-5.5, the standard discovery workflow ran through a chunk-and-retrieve pipeline. Documents went into a vector database; queries pulled the top-N most-relevant chunks; the model reasoned over those chunks. The pipeline lost cross-document context routinely — if document A on page 47 referenced document C on page 312, the retrieval step often missed the connection.

Post-5.5, with 1M context, the workflow inverts. Load the whole production. Ask the question. The model attends to all documents simultaneously, catches the cross-references the retrieval pipeline missed, and produces a reasoned summary that traces back to specific document IDs and page numbers without retrieval gymnastics.

The practical implication for an associate doing first-pass relevance review: instead of running 50-100 individual queries against a chunked index, you run one query against the full production and get back a structured summary with cross-document threads pre-identified. That's 5-10x time savings on first pass. The tool calls and legal research coherence spoke walks the agentic version of this workflow.

The limitation: 1M-token loads cost $5 per query just on input at the standard $5/M rate per OpenAI's API pricing. For exploratory research where you load the same production 20 times across an associate team, that's $100 in input alone before counting output. Cached input ($0.50/M, 90% off) recovers most of that gap after first load — but only if your tooling is set up to take advantage of caching.

When 1M context wins versus when retrieval still wins

1M context wins when the workload is single-shot megadoc analysis. You load the production once, ask the question, and want the answer reasoned across the whole set. Discovery first-pass relevance, deposition transcript analysis, full-document contract diligence — all fit this pattern.

Retrieval still wins when the workload is repeated-query against a stable corpus. If your firm has 50,000 prior matters indexed and associates ask 200 questions a week against that corpus, paying $5 per query (or even $0.50 cached) for full-context loads doesn't make sense. A vector retrieval pipeline is cheaper and faster. The model layer becomes one of multiple tools in a stack, not the primary reasoning surface.

The second-order tradeoff: retrieval pipelines lose cross-document context. For new matters where every document is novel and the cross-references are part of what you're trying to find, retrieval fails in a way that 1M context doesn't. For known corpora where you're asking pattern questions across years of accumulated work, retrieval succeeds in ways that 1M context can't economically match.

Most mid-market and BigLaw firms will run both patterns: 1M context for new-matter discovery and deposition work, retrieval pipelines for cross-matter knowledge management. Procurement teams that try to standardize on one pattern will hit workloads that don't fit. The Pro vs standard upgrade decision covers the workload-shape question at the model-tier level.

Comparison: GPT-5.5's 1M against Claude Opus 4.7's 200K plus memory

Anthropic's Opus 4.7 holds a 200K context window — substantially smaller than GPT-5.5's 1M. But Opus 4.7 ships multi-session memory persistence via scratchpad/notes file (Anthropic docs). Two different solutions to the same long-context problem.

For single-shot litigation discovery analysis where you need the full production reasoned together, GPT-5.5's 1M context wins outright. Opus 4.7 either rejects the load or requires chunking.

For a multi-week matter where the discovery review spans 30+ sessions across a deal team, Opus 4.7's memory model wins. Claude writes notes mid-session, you save the file with the matter, and the next session resumes where the prior one stopped. Same parties, same documents, same line of analysis. GPT-5.5's 1M context resets every session by default; persistence requires custom infrastructure.

The operator read: pick by workload shape. A solo litigator handling 5-10 active matters with megadoc productions — GPT-5.5. A BigLaw associate on a 12-month patent litigation with rolling discovery production — Opus 4.7's memory model. Most BigLaw firms will run both at portfolio scale. The detailed GPT-5.5 vs Claude Opus 4.7 comparison walks the per-use-case math.

Per-matter cost modeling: what 1M-context discovery actually bills

For a 600-document discovery production at 200 pages per document average, the production runs about 400,000 tokens of input text (rough conversion at 1.3 tokens per word, 250 words per page). The model also needs the question prompt and the conversation history — call it 30,000 additional tokens. So a single full-production query is about 430,000 input tokens.

At $5/M input, that's $2.15 per query on the first load. Output for a structured summary runs 5,000-10,000 tokens. At $30/M output, that's $0.15-$0.30. Total: about $2.30-$2.45 per first-pass relevance query against the full production.

If the associate runs 30 queries against the same production over a week (different angles, different parties, different document types), the cached input rate ($0.50/M) cuts subsequent queries by 90%. Queries 2-30 cost about $0.21-$0.36 each. Total weekly spend: roughly $9-$13 per associate for a full discovery first-pass workflow.

That's the per-matter math partners can put in a budget memo. Compare to the bulk-discovery vendor pricing model (per-document fees, per-GB ingestion charges, monthly platform minimums) and the in-house workflow becomes economically defensible for most mid-market matters. The third-order angle: discovery vendors who built their pricing around per-document fees are now competing against an in-house GPT-5.5 workflow with $0.36-per-query economics. Vendor pricing pressure will arrive within 12 months. The GPT-5.5 vs Harvey AI / CoCounsel vendor decision spoke covers the legal-tech vendor side of this same shift.

Privilege and verification considerations on 1M-context loads

Loading 600 documents into a single context creates two operational risks. First, privileged documents inadvertently pulled into a non-privileged production review get reasoned over by the model. The model's output may surface privileged content even if the prompt didn't ask for it. Mitigation: pre-screen the production for privilege markers before the load, and configure the prompt to flag any document marked privileged with explicit instruction to disregard. The Heppner ruling (SDNY Feb 17, 2026 explainer) confirmed consumer-AI exchanges aren't privileged; enterprise tier is the procurement floor for privileged work.

Second, citation verification still applies. Even with the full context loaded, the model can summarize a document's holding incorrectly. The verification protocol post-5.5 looks like this: every model-generated assertion about a document's content gets confirmed against the actual document by an associate spot-checking the cited page. The citation verification protocol spoke walks the workflow change.

The second-order risk: ChatGPT Plus ($20/month) carries weaker data-handling than ChatGPT Business ($25/user/month per OpenAI Business pricing) or Enterprise (quote-only). Associates loading client production documents into Plus accounts are creating a data-handling problem. Business or Enterprise is the procurement floor for any privileged or client-confidential discovery workflow.

The Bottom Line: My take: GPT-5.5's 1M context window changes the economics of litigation discovery first-pass review. For most mid-market matters, the in-house workflow at $0.36/query cached beats per-document vendor pricing on per-matter math. The structural caveat is verification — calibration improved but didn't eliminate failure modes. Update the protocol, document the model version in your file notes, and pre-screen for privilege before loading.

AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.