Claude Opus 4.7 task budgets are the feature legal procurement actually needed and didn't ask for. Anthropic shipped them with the April 16, 2026 release alongside the new "xhigh" effort level and the multi-session memory persistence. For discovery document review — the wedge use case for legal AI since 2023 — task budgets convert unpredictable token consumption into a defensible line item. A partner can now put Claude spend in a budget memo with a number on it. Per Anthropic's Claude Opus 4.7 release notes, the budget is enforced across an entire agentic loop with a running countdown that helps the model prioritize and finish gracefully. Here's how that changes the discovery economics.


What task budgets actually are at the model layer

A task budget is a token cap set on a multi-step agent run. You tell Claude: "Spend up to 2 million tokens completing this discovery review." The model tracks usage against the cap as it works, prioritizes the highest-signal documents first, and stops gracefully when it approaches the limit — reporting what got covered and where it stopped.

This isn't a rate limit. It's a *budget* that the model is aware of and plans against. The 4.6 behavior was: the agent ran until it finished or until you killed it. The 4.7 behavior is: the agent runs until it finishes, hits the cap, or determines the remaining budget can't cover the remaining work productively, and reports cleanly.

For a $5 per million input token + $25 per million output token Opus 4.7 deployment (per Anthropic's pricing page), a 2M-token budget at a typical 70/30 input/output split is roughly $22 in raw model spend per agent run. That's a number a partner can put in a matter budget memo and defend at the post-matter cost-recovery review.

Why discovery document review specifically benefits

Discovery has always been the legal-AI economics problem. A 50,000-document production needs first-pass relevance review. The traditional vendor model priced this per document, per page, or per hour. The in-house Claude model priced it per token; which made it cheaper *on average* but left every matter exposed to consumption variance. Some documents are short, some are long, some require deep reasoning, some get auto-classified instantly.

With task budgets, you set a cap on the entire 50,000-document run. Claude prioritizes by signal; running cheap classification on the obvious-irrelevant first, reserving expensive reasoning for the documents that matter. Within a few hundred documents, the model has enough learned context to skip patterns and finish the production within budget.

The second-order effect: discovery vendors who priced themselves on "per-document fixed fee" assumptions are now competing against in-house workflows with provable per-matter economics. The third-order effect: insurance carriers writing AI deployment policies will start asking firms whether they use models with task budgets, because predictability lowers operational risk. Carriers price predictability; that's literally the underwriting model. The Opus 4.7 anchor covers the broader procurement implications.

How to size a task budget for a real discovery review

Three inputs determine the right cap:

Document count and average length. A 50,000-document production with average length 3 pages (~1,500 tokens per doc) is 75M input tokens just to read the corpus once. Most relevance reviews don't read every doc fully on first pass; they classify on metadata + first-page content. Realistic input on first-pass: 20-30M tokens.

Reasoning depth required. Routine relevance review (responsive vs not) consumes much less output than privilege review (which requires explaining the privilege call). Plan 5-15M output tokens for relevance, 20-40M for privilege.

Effort level. xhigh consumes more reasoning tokens per task than high. For first-pass relevance, high is usually sufficient. For privilege calls, xhigh earns its premium by reducing false negatives (which carry malpractice exposure).

A defensible budget for a 50,000-document first-pass relevance review at high effort runs 25-40M tokens total, or roughly $625-1,000 in raw model spend. A privilege re-review at xhigh on the responsive subset (say 8,000 docs) runs another 12-18M tokens, or $300-450. The full first-pass + privilege workflow lands in the $900-1,500 range. That's a number for the budget memo. The tokenizer cost calculator lets you model your specific document mix.

What happens when the budget runs out mid-matter

Two failure modes the procurement memo should anticipate:

Underrun is fine. Claude finishes early, reports the unused budget, and you're done. The bill is lower than projected. This happens when the document corpus is more uniform than expected and the model classifies most documents quickly.

Overrun requires a decision. Claude approaches the cap and reports: "Covered 38,400 of 50,000 documents within budget. Remaining 11,600 estimated to require an additional 8M tokens." The matter team now has a clean decision point; extend the budget, redirect the remaining docs to a different model, or accept the partial coverage with appropriate documentation.

The operational rule: the budget shouldn't be set so tight that overruns are routine, and shouldn't be set so loose that the cap doesn't constrain. A reasonable starting heuristic is 1.3x the expected token consumption. That gives the model room to handle outlier-document complexity without blowing the cap, but still constrains an out-of-control agent run.

For matters with hard cost ceilings; fixed-fee engagements, court-supervised budgets, insurance-funded defense; the budget becomes a risk-management tool, not just a planning tool. The multi-session memory M&A diligence guide covers an analogous use case in transactional work.

The procurement and ethics policy implications

Task budgets change three things in firm AI policy:

Budget memos can include AI line items with defensible numbers. Before 4.7, the honest answer was "somewhere between $300 and $4,000 this month." Now it's "Claude review: 2M token budget, $22 per agent run, expected 50 runs across the matter, $1,100 capped."

Cost recovery and matter billing get cleaner. Firms that bill AI consumption back to clients can now project a cap and bill against actual usage with a defensible underlying methodology. Firms that absorb AI cost into overhead can defend the per-matter math at year-end review.

Insurance and procurement questionnaires will start asking about task budgets. AI deployment policies that name the model but not the budget structure are now stale. Update the policy template to specify (a) which matters get task budgets enabled, (b) who sets the cap, (c) what the escalation rule is on overrun. The cybersecurity safeguards privileged context spoke covers a parallel policy update.

For firms running consumption-based Anthropic Enterprise contracts, the task budget is also a leverage point in renewal negotiations. Predictable consumption patterns let you commit to volume tiers with less risk.

Where task budgets don't fit

Three workflows where task budgets add overhead without much benefit:

One-shot research questions. Asking Claude a single legal research question that returns in 30 seconds doesn't need a token cap. The bill is small either way. Task budgets earn their keep on long-running agent loops.

Interactive drafting sessions. A partner working through a brief with Claude over an hour isn't running an agent; they're conversing. A budget cap interrupts the flow without saving meaningful spend.

Highly variable matters where budget signals nothing useful. If you don't know whether the matter will be 5,000 documents or 500,000, setting a token budget is theater. Use Claude on a small sample first to calibrate, then set a budget for the production run.

The rule of thumb: if the work has a defined corpus and a defined output target, task budgets help. If the work is exploratory or interactive, they don't. The creative writing brief drafting spoke covers the interactive end of the spectrum.

The Bottom Line: The verdict: task budgets are the procurement-grade feature 4.7 shipped without fanfare. They convert AI discovery review from "unpredictable consumption" to "line item with a defensible cap." Any firm running matter-level Claude deployments should have task budgets configured by the next discovery production. Skip them on interactive work; require them on agent-driven document review.

AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.