What are Claude Opus 4.7 task budgets?

Task budgets are token caps set on multi-step agent runs. You tell Claude how many tokens it can spend completing a task; the model tracks usage with a running countdown, prioritizes the highest-signal work first, and stops gracefully when it approaches the cap. Anthropic shipped task budgets with Opus 4.7 on April 16, 2026, alongside the new xhigh effort level and multi-session memory. For legal discovery, task budgets convert unpredictable AI consumption into a defensible per-matter cost line. A typical 2M-token budget on a 70/30 input/output split runs roughly $22 in model spend at the published $5 per million input and $25 per million output rates.

How do task budgets help with discovery document review?

Discovery review has unpredictable token consumption: some documents classify instantly, others require deep reasoning. Task budgets let you cap the entire review at a token ceiling and let Claude allocate budget intelligently across the corpus. For a 50,000-document first-pass relevance review at high effort, a defensible budget runs 25-40 million tokens, or roughly $625-1,000 in raw model spend. A privilege re-review at xhigh on the responsive subset adds another $300-450. Total workflow lands in the $900-1,500 range, which is a number a partner can put in a budget memo and defend at post-matter review.

How is a task budget different from a rate limit?

A rate limit caps usage per unit time across an account; the model has no awareness of it during a task. A task budget caps usage per agent run and the model is aware of it during execution. The model plans against the budget, prioritizes high-signal work first, and reports cleanly at the cap. Rate limits prevent catastrophic billing scenarios across the whole account. Task budgets enforce predictability inside a single matter or workflow. Most production legal-AI deployments need both: rate limits at the firm level, task budgets at the matter level.

Does Claude Opus 4.7 task budgets work with multi-session memory?

Yes. Task budgets and multi-session memory are independent features that compose. You can run a multi-day discovery review where each session has its own task budget and the persistent scratchpad carries context between sessions. For a 12-day review, that means you set 12 daily caps and Claude accumulates learned context across days without restarting. The bill is predictable per day; the work compounds across the matter. The multi-session memory M&A diligence guide covers the persistence side; this spoke covers the budget side. Both features shipped April 16, 2026.

When should law firms not use task budgets?

Three workflows where task budgets add overhead without benefit. One-shot research questions that return in 30 seconds; the bill is small either way. Interactive drafting sessions where a partner converses with Claude over an hour; a budget cap interrupts the flow. Highly variable matters where the corpus size is unknown; setting a token budget on a 5,000-or-500,000-document range is theater. Use Claude on a small sample first to calibrate, then set the budget for the production run. The rule of thumb: defined corpus plus defined output target means task budgets help. Exploratory or interactive work means they don't.

How do task budgets affect law firm AI policy and procurement?

Three concrete updates. First, AI policy templates should specify which matter types require task budgets, who sets the cap, and the escalation rule on overrun. Second, budget memos and matter billing can now include defensible AI line items with caps and projected runs. Third, insurance and procurement questionnaires are starting to ask whether deployed models support task budgets, because predictability lowers operational risk and carriers price predictability. Firms running consumption-based Anthropic Enterprise contracts can also use task budget data as leverage in renewal negotiations to commit to volume tiers with less risk.

Claude Opus 4.7 Task Budgets Discovery Document Review

Q: Can task budgets prevent runaway AI costs on agentic legal work?

Yes, that's the core use case. Before Opus 4.7, an agent loop ran until it finished or got killed manually. With task budgets, the agent self-terminates gracefully at the cap and reports what it covered. For matters with hard cost ceilings; fixed-fee engagements, court-supervised budgets, insurance-funded defense; task budgets become a risk-management tool. The operational heuristic: set the budget at roughly 1.3 times expected consumption. Tight enough to constrain runaway agents, loose enough to handle outlier-document complexity without routine overrun.

Claude Opus 4.7 task budgets are the feature legal procurement actually needed and didn't ask for. Anthropic shipped them with the April 16, 2026 release alongside the new "xhigh" effort level and the multi-session memory persistence. For discovery document review — the wedge use case for legal AI since 2023 — task budgets convert unpredictable token consumption into a defensible line item. A partner can now put Claude spend in a budget memo with a number on it. Per Anthropic's Claude Opus 4.7 release notes, the budget is enforced across an entire agentic loop with a running countdown that helps the model prioritize and finish gracefully. Here's how that changes the discovery economics.

What task budgets actually are at the model layer

A task budget is a token cap set on a multi-step agent run. You tell Claude: "Spend up to 2 million tokens completing this discovery review." The model tracks usage against the cap as it works, prioritizes the highest-signal documents first, and stops gracefully when it approaches the limit — reporting what got covered and where it stopped.

This isn't a rate limit. It's a *budget* that the model is aware of and plans against. The 4.6 behavior was: the agent ran until it finished or until you killed it. The 4.7 behavior is: the agent runs until it finishes, hits the cap, or determines the remaining budget can't cover the remaining work productively, and reports cleanly.

For a $5 per million input token + $25 per million output token Opus 4.7 deployment (per Anthropic's pricing page), a 2M-token budget at a typical 70/30 input/output split is roughly $22 in raw model spend per agent run. That's a number a partner can put in a matter budget memo and defend at the post-matter cost-recovery review.

Why discovery document review specifically benefits

Discovery has always been the legal-AI economics problem. A 50,000-document production needs first-pass relevance review. The traditional vendor model priced this per document, per page, or per hour. The in-house Claude model priced it per token; which made it cheaper *on average* but left every matter exposed to consumption variance. Some documents are short, some are long, some require deep reasoning, some get auto-classified instantly.

With task budgets, you set a cap on the entire 50,000-document run. Claude prioritizes by signal; running cheap classification on the obvious-irrelevant first, reserving expensive reasoning for the documents that matter. Within a few hundred documents, the model has enough learned context to skip patterns and finish the production within budget.

The second-order effect: discovery vendors who priced themselves on "per-document fixed fee" assumptions are now competing against in-house workflows with provable per-matter economics. The third-order effect: insurance carriers writing AI deployment policies will start asking firms whether they use models with task budgets, because predictability lowers operational risk. Carriers price predictability; that's literally the underwriting model. The Opus 4.7 anchor covers the broader procurement implications.

How to size a task budget for a real discovery review

Three inputs determine the right cap:

Document count and average length. A 50,000-document production with average length 3 pages (~1,500 tokens per doc) is 75M input tokens just to read the corpus once. Most relevance reviews don't read every doc fully on first pass; they classify on metadata + first-page content. Realistic input on first-pass: 20-30M tokens.

Reasoning depth required. Routine relevance review (responsive vs not) consumes much less output than privilege review (which requires explaining the privilege call). Plan 5-15M output tokens for relevance, 20-40M for privilege.

Effort level. xhigh consumes more reasoning tokens per task than high. For first-pass relevance, high is usually sufficient. For privilege calls, xhigh earns its premium by reducing false negatives (which carry malpractice exposure).

A defensible budget for a 50,000-document first-pass relevance review at high effort runs 25-40M tokens total, or roughly $625-1,000 in raw model spend. A privilege re-review at xhigh on the responsive subset (say 8,000 docs) runs another 12-18M tokens, or $300-450. The full first-pass + privilege workflow lands in the $900-1,500 range. That's a number for the budget memo. The tokenizer cost calculator lets you model your specific document mix.

What happens when the budget runs out mid-matter

Two failure modes the procurement memo should anticipate:

Underrun is fine. Claude finishes early, reports the unused budget, and you're done. The bill is lower than projected. This happens when the document corpus is more uniform than expected and the model classifies most documents quickly.

Overrun requires a decision. Claude approaches the cap and reports: "Covered 38,400 of 50,000 documents within budget. Remaining 11,600 estimated to require an additional 8M tokens." The matter team now has a clean decision point; extend the budget, redirect the remaining docs to a different model, or accept the partial coverage with appropriate documentation.

The operational rule: the budget shouldn't be set so tight that overruns are routine, and shouldn't be set so loose that the cap doesn't constrain. A reasonable starting heuristic is 1.3x the expected token consumption. That gives the model room to handle outlier-document complexity without blowing the cap, but still constrains an out-of-control agent run.

For matters with hard cost ceilings; fixed-fee engagements, court-supervised budgets, insurance-funded defense; the budget becomes a risk-management tool, not just a planning tool. The multi-session memory M&A diligence guide covers an analogous use case in transactional work.

The procurement and ethics policy implications

Task budgets change three things in firm AI policy:

Budget memos can include AI line items with defensible numbers. Before 4.7, the honest answer was "somewhere between $300 and $4,000 this month." Now it's "Claude review: 2M token budget, $22 per agent run, expected 50 runs across the matter, $1,100 capped."

Cost recovery and matter billing get cleaner. Firms that bill AI consumption back to clients can now project a cap and bill against actual usage with a defensible underlying methodology. Firms that absorb AI cost into overhead can defend the per-matter math at year-end review.

Insurance and procurement questionnaires will start asking about task budgets. AI deployment policies that name the model but not the budget structure are now stale. Update the policy template to specify (a) which matters get task budgets enabled, (b) who sets the cap, (c) what the escalation rule is on overrun. The cybersecurity safeguards privileged context spoke covers a parallel policy update.

For firms running consumption-based Anthropic Enterprise contracts, the task budget is also a leverage point in renewal negotiations. Predictable consumption patterns let you commit to volume tiers with less risk.

Where task budgets don't fit

Three workflows where task budgets add overhead without much benefit:

One-shot research questions. Asking Claude a single legal research question that returns in 30 seconds doesn't need a token cap. The bill is small either way. Task budgets earn their keep on long-running agent loops.

Interactive drafting sessions. A partner working through a brief with Claude over an hour isn't running an agent; they're conversing. A budget cap interrupts the flow without saving meaningful spend.

Highly variable matters where budget signals nothing useful. If you don't know whether the matter will be 5,000 documents or 500,000, setting a token budget is theater. Use Claude on a small sample first to calibrate, then set a budget for the production run.

The rule of thumb: if the work has a defined corpus and a defined output target, task budgets help. If the work is exploratory or interactive, they don't. The creative writing brief drafting spoke covers the interactive end of the spectrum.

The Bottom Line: The verdict: task budgets are the procurement-grade feature 4.7 shipped without fanfare. They convert AI discovery review from "unpredictable consumption" to "line item with a defensible cap." Any firm running matter-level Claude deployments should have task budgets configured by the next discovery production. Skip them on interactive work; require them on agent-driven document review.

AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.

Claude Opus 4.7 Task Budgets Discovery Document Review

What task budgets actually are at the model layer

Why discovery document review specifically benefits

How to size a task budget for a real discovery review

What happens when the budget runs out mid-matter

The procurement and ethics policy implications

Where task budgets don't fit

Frequently Asked Questions

Related Across AI Vortex

Need help with AI infrastructure?

What task budgets actually are at the model layer

Why discovery document review specifically benefits

How to size a task budget for a real discovery review

What happens when the budget runs out mid-matter

The procurement and ethics policy implications

Where task budgets don't fit

Frequently Asked Questions

More from Guides

Related Across AI Vortex

Need help with AI infrastructure?