300+ federal judges have AI standing orders. Almost none require model-version disclosure. The structural gap matters now. OpenAI shipped GPT-5.5 on April 23, 2026, with calibration improvements that reduce hallucination rates compared to GPT-5.4. Anthropic shipped Opus 4.7 on April 16 with similar improvements over 4.6. The Charlotin hallucination database catalogs 1,227 documented sanctions cases globally per its site at HEC Paris. Federal standing orders treat all model versions as equivalent. With the calibration delta between versions now meaningful, treating them identically is sanctioning the wrong thing. This spoke walks the structural argument for updating disclosure orders, the firms most exposed to the gap, and the internal-template move that protects firms ahead of the rules.
What federal court AI standing orders currently say
Per Bloomberg Law's federal court judicial standing-orders tracker, Ropes & Gray's AI Court Order Tracker, and the Responsible AI in Legal Services AI Orders resource, 300+ federal judges had AI-related standing orders or local rules as of early 2026. Judge Brantley Starr (NDTX, 2023) issued the original template; most subsequent orders adapt variations.
The orders fragment along several axes:
- Tool name disclosure — some orders require attorneys to identify the AI tool used (ChatGPT, Claude, Gemini, Spellbook, etc.). - Section disclosure — some require attorneys to flag specific sections of filings drafted with AI assistance. - Citation verification certification — some require attorneys to certify they verified all AI-generated citations against authoritative sources. - Use prohibition — a smaller number prohibit AI use entirely for specific filing types. - No requirement — many federal jurisdictions still have no AI-specific order, leaving general candor-to-tribunal rules to apply.
What almost none require: model version. Starr's 2023 template doesn't differentiate. Most adaptations don't either. Per UNC Law Library's coverage of judicial guidance on generative AI and Spellbook's analysis of when lawyers must disclose AI use, version-specific disclosure is not currently a standard requirement.
Why version matters now (and didn't matter as much in 2023)
In 2023 — when the first AI standing orders were drafted — GPT-3.5, GPT-4, and early Claude were all unreliable in roughly similar ways. The calibration delta between versions was small. Treating "any AI tool" as a single category for disclosure purposes was a reasonable simplification.
In April 2026, that simplification breaks. The calibration delta between GPT-5.4 and GPT-5.5 is meaningful, per OpenAI's GPT-5.5 system card — "less likely to proceed confidently with a bad plan." The delta between Claude Opus 4.6 and 4.7 is similar. The delta between consumer Plus and enterprise Business tiers is meaningful too — different data-handling, different version-update cadence, different effort-level defaults.
The structural argument: when calibration delta between versions matters for sanctions analysis, treating versions identically is sanctioning the wrong thing. A 5.4 hallucination represents a different operational failure than a 5.5 hallucination. The verification protocols that should catch each are different. Insurance carriers pricing risk on each are different. Yet the standing orders treat them as the same disclosure event.
The second-order angle: as model improvements continue at quarterly cadence (5.4→5.5 in 6 months; Opus 4.6→4.7 in similar window), the version-mismatch problem compounds. By Q4 2026, standing orders that don't differentiate versions will be sanctioning behavior that varies by 30-50% in actual hallucination rate as if it were the same behavior.
Which firms are most exposed to the gap
Three categories of firms face the largest exposure to the version-disclosure gap:
Litigation-heavy mid-market firms. Per the GPT-5.5 vs Harvey AI / CoCounsel vendor decision spoke, mid-market firms (10-100 attorneys) increasingly use foundation-model AI directly via ChatGPT Business or Claude Team. Without internal disclosure templates that capture model version, these firms have no record of which version produced which output if a sanctions question surfaces later. Multi-attorney teams using shared accounts compound the version-tracking problem.
Solo practitioners using consumer tiers. Solos on ChatGPT Plus ($20/month) or Claude Pro ($20/month) get whichever version OpenAI or Anthropic happens to be serving on the consumer tier at any moment. Version-update cadence on consumer tiers can lag enterprise tiers by days or weeks. A solo who used "GPT-5.5" on April 24 may have actually been served GPT-5.4 if the rollout hadn't reached their account. Without server-side logs, the version is unverifiable.
BigLaw firms running multi-vendor. Per the Anthropic eating the legal stack analysis, BigLaw firms increasingly run GPT-5.5 (via Microsoft Copilot at $30/user/month per Microsoft enterprise pricing), Opus 4.7 (via Microsoft Foundry or claude.ai Team at $25/user/month per Claude pricing), and Gemini 3.1 Pro (via Vertex AI) simultaneously. Cross-vendor version tracking is operationally complex without specific tooling.
The Cherry Hill federal sanction on April 27, 2026 (per The Inquirer) named an attorney who couldn't recall whether he'd used Claude, ChatGPT, or Grok. Without internal disclosure infrastructure, the version question doesn't have a clean answer.
The internal-template move that protects firms ahead of the rules
Firms that update internal disclosure templates to include model version and date of use protect themselves against the rule update that's likely coming. The template addition is small:
Required metadata per AI-assisted draft: - AI tool name (ChatGPT, Claude, Gemini, etc.) - Specific model version (GPT-5.5, Opus 4.7, Gemini 3.1 Pro) - Effort level (standard vs Pro/xhigh — see the Pro vs standard upgrade spoke) - Date and time of use - Deployment surface (claude.ai Team, ChatGPT Business, Microsoft Copilot, OpenAI API, AWS Bedrock, etc.) - Account/user ID for attribution
The metadata travels with the document through review, partner sign-off, and filing. When a sanctions question surfaces — court inquiry, bar regulator, insurance carrier — the firm has the version-specific record. Without it, the firm is reconstructing from memory and email logs.
The operational implementation: the metadata can live as a footer block on the draft, a comment in the document management system, or a structured field in the matter management software. Most matter management platforms (iManage, NetDocuments, Clio) already support custom metadata fields. The setup is one-time work; ongoing capture is associate discipline plus policy enforcement.
The second-order benefit: the metadata becomes data. Firms that capture model-version usage across the practice can analyze which versions produce which kinds of outputs, which workflows benefit from Pro vs standard, and which deployment surfaces have version-update lag issues. The data is also useful for AI policy refinement and per-matter cost recovery.
What changes when the rules update
When federal court standing orders begin requiring version disclosure (the conservative estimate is 12-18 months from April 2026, but a single high-profile sanctions case could accelerate it to 6 months), three operational changes follow.
First, version disclosure becomes a filing certification. Attorneys signing a brief certify the model name, version, and date of use for any AI-assisted content. Filing without disclosure or with incorrect disclosure becomes its own sanction trigger.
Second, version-specific verification protocols emerge. Different model versions have different failure modes. A 5.4 verification protocol focused on confident-fabrication catches different errors than a 5.5 verification protocol that needs to catch holding-misquote and statute-section drift. State bar ethics opinions are likely to begin specifying version-aware verification standards.
Third, insurance carrier underwriting incorporates version. Per the calibration improvement and AI hallucination sanctions spoke, carriers are already starting to ask which models firms use. Version-specific underwriting follows version-specific sanctions data. Firms with documented version logs get favorable underwriting; firms without get higher premiums or coverage exclusions.
The firm-level move that compounds: update templates now, capture metadata starting today, and build the version log over the next 6-12 months. When the rules update, the firm is already in compliance. Firms that wait for the rules face retroactive reconstruction of records they didn't capture in real time.
The Bottom Line: My take: The current federal AI standing orders treat 5.4 and 5.5 as the same disclosure event. The calibration delta between versions makes that wrong. Update internal disclosure templates now to capture model version, effort level, and deployment surface metadata. Build the version log starting today. When standing orders update — likely within 12-18 months — the firms that captured the data are in compliance and the firms that didn't are reconstructing from memory under sanctions pressure.
AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.
