Claude Opus 4.7 effort levels added a new "xhigh" tier between high and max when Anthropic shipped the model on April 16, 2026. Per the What's New in Claude Opus 4.7 docs, xhigh gives finer control over the reasoning-latency tradeoff. Claude Code defaults to xhigh on all paid plans — meaning associates already running Claude Code without firm authorization are getting xhigh by default and the bill reflects it. For legal teams writing AI policy that names "Claude" without naming the effort level, the policy is now stale. Here's when xhigh earns its premium and when it doesn't.


What effort levels actually do

Claude's effort level controls how much reasoning the model does before returning an answer. Higher effort means more internal token consumption, longer latency, and (usually) better answers on complex tasks. Lower effort means fewer tokens, faster responses, and (usually) sufficient quality on routine work.

The 4.7 effort hierarchy:

- Low — minimal reasoning; fastest; cheapest output spend. For straightforward retrieval and short answers. - Medium — moderate reasoning; balanced. The default for most interactive use. - High; extensive reasoning; longer latency. For complex analysis where quality matters more than speed. - xhigh (new in 4.7); between high and max. Adds reasoning depth without the full max-effort latency penalty. - Max; deepest reasoning; longest latency; highest spend. For the hardest tasks where wrong answers carry significant cost.

The reason xhigh shipped: there was a quality gap between high and max where high wasn't quite enough but max was overkill. xhigh fills it. The Opus 4.7 anchor covers the full change set.

Three categories where xhigh consistently earns its premium:

Multi-step legal reasoning with citation interaction. Drafting a brief that needs to address opposing counsel's argument with case law that cuts in different directions across jurisdictions. xhigh's deeper reasoning catches cross-citation conflicts that high misses.

Contract clause interaction analysis. Reviewing a complex commercial contract where Section 7's indemnity caps interact with Section 12's limitation of liability and Section 18's choice-of-law. Each section read in isolation is fine on high. The interaction analysis is where xhigh pulls ahead.

Deposition strategy and witness preparation. Building a witness exam outline that anticipates likely defenses, considers credibility undermines, and sequences questions to avoid early cross-pollination. xhigh handles the strategic layering better than high.

Privileged document review where false negatives carry malpractice risk. Privilege review at xhigh reduces the rate at which privileged documents get marked non-privileged and produced. The cost premium is small relative to the malpractice exposure that follows a privilege waiver.

For these categories, the additional output-token spend at xhigh is meaningful but not dramatic; typically 1.3-1.7x the high-effort spend on the same task. The task budgets discovery spoke covers the cost-control side.

When xhigh is overkill

Three categories where xhigh adds cost without much benefit:

Routine document classification. First-pass relevance review on a discovery production. The model is making binary or low-cardinality decisions; high effort is more than enough. xhigh increases the bill without changing the classification quality materially.

Standardized form drafting. NDAs against a known playbook, simple engagement letters, routine demand letters. The work is template-driven; the model isn't doing creative reasoning. high or even medium handles this well.

Interactive Q&A research. A partner asking quick legal-research questions over an hour-long working session. xhigh's latency tax accumulates across the session and frustrates the user. medium or high keeps the conversation flowing without sacrificing quality on the kind of questions that get asked interactively.

Bulk extraction work. Pulling structured data from a large corpus; dollar amounts, dates, party names. The work is mechanical; reasoning depth doesn't help.

The heuristic: if the work is interaction-heavy or pattern-matching, stay at medium or high. If the work is interaction-light and the analysis is the deliverable, xhigh or max earns the premium. The creative writing brief drafting spoke covers a related effort-level decision.

Claude Code defaults and the policy implication

Claude Code defaults to xhigh on all paid plans. This is a real procurement implication most firms haven't surfaced.

Associates running Claude Code on personal Pro subscriptions ($20/user/month) or firm-issued Team accounts ($25/user/month) are running xhigh by default. The output-token consumption is materially higher than equivalent work on lower effort. For firms tracking consumption-based AI spend, the bill reflects the default.

The second-order effect: AI policies that say "Claude is approved for use" without specifying effort levels are de facto authorizing xhigh on Claude Code. That may or may not be the firm's intent. The third-order effect: associates building automation tooling on Claude Code (which legitimately benefits from xhigh) get the right experience by default; associates running Claude Code for routine tasks get the wrong cost profile.

The policy update: name the model version (4.7+), name the deployment surface (claude.ai Team, Enterprise, API, Bedrock, Vertex, Foundry), and either name the effort level expectations (e.g., "high default; xhigh for designated matter types") or accept the Claude Code xhigh default. The jailbreak risk and confidentiality firm policy spoke covers a parallel policy update.

Cost math: what xhigh actually adds to a matter bill

Anthropic's published pricing as of April 2026 is $5 per million input tokens and $25 per million output tokens for Opus 4.7. Effort levels affect output token consumption; xhigh produces more reasoning tokens per response than high.

Approximate output-token multipliers (model behavior, not vendor commitment):

- Medium: 1.0x baseline - High: 1.4-1.8x medium - xhigh: 1.8-2.5x medium - Max: 2.5-4.0x medium

For a legal research session that consumes 500K output tokens at medium ($12.50), high lands around $20, xhigh around $30, and max around $50. Per-session, the differences are small. Across 50 sessions a month per attorney across a 25-attorney mid-market firm, the differences compound: medium ~$15K, high ~$24K, xhigh ~$36K, max ~$60K.

That math illustrates why effort-level governance matters at scale. For a single one-off complex task, xhigh is the right call. For routine workflow, defaulting to xhigh is overspending. The tokenizer cost calculator lets you model your actual usage profile.

Practical effort-level rules by use case

A working policy template legal teams can adopt:

Default to high for matter-context work. It's the right balance of quality and cost for most analysis, drafting, and review tasks.

Use xhigh for: - Final brief drafting where the argument structure matters - Cross-jurisdictional citation analysis - Complex contract interaction review - Privilege determinations on close-call documents - Deposition exam outlines and witness prep strategy - Settlement scenario modeling with multiple variables

Use max for: - Appellate brief drafting on dispositive issues - Constitutional or first-impression questions - Complex multi-party arbitration strategy - Bet-the-company transaction structuring

Drop to medium or low for: - Quick interactive Q&A research sessions - Routine document classification - Standard form drafting against playbooks - Bulk extraction work

For firms running Claude Code on legal-engineering workflows, the xhigh default is usually right; the cost premium pays back in fewer iterations on complex code. For firms running Claude in non-coding workflows, surface the effort-level decision in the firm AI policy. The context window long-document analysis spoke covers the context-side of the same procurement decision.

The Bottom Line: The verdict: xhigh is the right default for complex legal analysis where the deliverable is the analysis itself, and overkill for routine workflow where speed and cost matter more than marginal quality gains. The policy fix is simple; name the effort level expectation by use case, and accept that Claude Code is xhigh by default unless you change it. For most mid-market firms, default to high with xhigh reserved for designated matter types pays back faster than xhigh-everywhere or high-everywhere.

AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.