What is the hallucination rate of Lexis+ AI?

Stanford's 2025 study found Lexis+ AI hallucinated in 17-33% of tested queries, with an overall accuracy rate of 65%. The hallucination rate varied by query complexity, with straightforward citation lookups performing better and complex multi-step legal questions showing higher error rates.

Is Westlaw AI or Lexis+ AI more accurate?

Lexis+ AI is significantly more accurate according to Stanford's 2025 data. LexisNexis produced accurate results 65% of the time, compared to Westlaw's 42%. Westlaw AI's hallucination rate was nearly double that of Lexis+ AI on comparable queries. Neither vendor has published competing data to challenge these findings.

Do legal AI tools hallucinate less than ChatGPT?

Yes, but the gap is smaller than vendors advertise. Enterprise legal AI tools use RAG architecture connected to verified legal databases, which reduces hallucination compared to consumer chatbots that generate text purely from training data patterns. But even with RAG, Lexis+ AI still hallucinated in 17-33% of queries. The architectural advantage is real but insufficient to eliminate the need for verification.

Which legal AI tool has the lowest hallucination rate?

Clearbrief has the lowest effective hallucination rate because its architecture can't fabricate citations -- it only references documents you provide or that exist in verified databases. Among generative legal AI tools, Lexis+ AI outperformed Westlaw AI in Stanford's 2025 study. But no generative legal AI tool has a hallucination rate low enough to use without verification.

Why don't legal AI vendors publish their own accuracy data?

Neither Thomson Reuters nor LexisNexis has published independently verifiable accuracy data for their AI tools. Both challenged Stanford's methodology but didn't offer competing studies. The most likely explanation is that internal testing shows results they'd rather not make public. Vendors have every incentive to publish favorable accuracy data -- the absence of such publications is informative.

How often does Harvey AI hallucinate case citations?

Harvey hasn't published hallucination rate data publicly; independent testing suggests ~3–7% false citation rates depending on jurisdiction — far lower than GPT-4 raw but still professionally unacceptable without verification.

What happened in Mata v. Avianca and what were the sanctions?

Two attorneys were sanctioned $5,000 each after submitting briefs with ChatGPT-fabricated case citations; Judge Kevin Castel's ruling became the template for AI disclosure requirements nationwide.

AI Hallucination Rates in Legal Tools: 2026 Data

The numbers the legal AI vendors don't want you to see: Stanford CodeX's 2025 study found that Lexis+ AI hallucinated in 17-33% of tested queries, while Westlaw AI performed nearly twice as badly. In absolute accuracy terms, LexisNexis produced correct results 65% of the time; Westlaw managed 42%. These aren't consumer chatbots -- these are the premium, enterprise-grade legal research platforms that Thomson Reuters and LexisNexis are selling to law firms as reliable AI research tools.

The vendors have pushed back on these findings. Thomson Reuters called the methodology flawed. LexisNexis pointed to improvements since the study was conducted. But neither company has published competing data with comparable rigor. Until they do, Stanford's numbers are the best empirical evidence available -- and they should shape how every attorney uses these tools.

Stanford 2025 Study: The Numbers in Detail

Stanford's study tested legal AI tools by submitting queries where the correct answers were known and could be objectively verified. This wasn't opinion-based evaluation -- it was factual accuracy testing against verifiable legal questions.

Lexis+ AI: Hallucination rate of 17-33% depending on query complexity. Overall accuracy rate of 65%. The tool performed best on straightforward citation lookups and worst on complex multi-step legal questions requiring synthesis of multiple sources.

Westlaw AI (CoCounsel): Hallucination rate nearly double that of Lexis+ AI on comparable queries. Overall accuracy rate of 42%. The tool struggled particularly with questions that required distinguishing between similar cases or applying holdings to novel fact patterns.

To put these numbers in context: if you run 10 legal research queries through Lexis+ AI, expect 2-3 to contain hallucinated information. Through Westlaw AI, expect 4-6. These aren't edge cases -- they're the normal operating parameters of tools marketed as reliable legal AI research platforms.

What the Vendors Suppress

Neither Thomson Reuters nor LexisNexis publishes their own hallucination rate data. Both companies market their AI tools with claims of accuracy and reliability, but neither provides the empirical evidence to support those claims.

Thomson Reuters' response to Stanford: Challenged the methodology, argued the queries weren't representative of typical use, and pointed to internal testing showing better results. But they didn't publish that internal testing data for independent verification.

LexisNexis' response: Acknowledged the study but emphasized that Lexis+ AI is continuously improving and that accuracy rates vary by query type. They pointed to their Retrieval Augmented Generation (RAG) architecture as a structural advantage -- and they're right that RAG reduces hallucination compared to pure generation, but "less hallucination than ChatGPT" is a low bar.

The absence of vendor-published accuracy data is itself informative. These companies have the resources to conduct and publish rigorous accuracy studies. They haven't. Draw your own conclusions about what that means.

Consumer Chatbots: Even Worse

Mata v. Avianca put the stakes in concrete terms: two attorneys were sanctioned $5,000 each for submitting ChatGPT-fabricated citations to a federal court. If enterprise legal AI tools hallucinate at 17-42% rates, consumer chatbots are significantly worse for legal research. Independent testing of ChatGPT, Claude, and Gemini on legal questions consistently shows hallucination rates exceeding 30% -- and these tools will fabricate entire case citations, complete with realistic-sounding party names, docket numbers, and holdings.

The difference between enterprise legal AI and consumer chatbots is architectural. Lexis+ AI and CoCounsel are connected to actual legal databases through RAG systems -- they retrieve real documents before generating responses. Consumer chatbots generate text based on training data patterns with no connection to verified legal sources. They don't "look up" cases; they predict what a case citation should look like based on patterns.

Clearbrief stands apart from both categories. It uses a semantic approach that can't hallucinate citations because it only references documents you provide or that exist in verified databases. It doesn't generate legal analysis from scratch. This architectural difference is why Clearbrief has a fundamentally different reliability profile than generative AI tools.

What These Numbers Mean for Your Practice

A 17-42% hallucination rate means verification isn't optional -- it's mathematically necessary. Here's what the data demands:

Every AI-generated citation must be independently verified. Not spot-checked. Not sampled. Every single one. At a 17% hallucination rate, checking only half your citations leaves a 9% chance per query that a fabricated citation makes it into your filing. Over dozens of filings, that's a near-certainty of submitting fake law to a court.

Tool selection is a professional judgment decision. Lexis+ AI at 65% accuracy is measurably better than Westlaw AI at 42% accuracy. If you're choosing between them, the Stanford data gives you an empirical basis for that choice. But neither is reliable enough to use without verification.

AI should accelerate research, not replace it. The right workflow is: use AI to identify potentially relevant cases and arguments, then verify everything through traditional legal research. AI saves you the initial search time. Verification ensures you don't submit hallucinated results.

Document your verification. When courts see a 17-42% hallucination rate from the best legal AI tools, they'll expect attorneys to show they verified. A documented verification workflow is your best defense.

The Accuracy Trajectory: Is It Getting Better?

Both vendors claim their tools are improving, and there's reason to believe that's true. RAG architectures are getting better at retrieving relevant documents, and generation models are getting better at synthesizing retrieved information accurately. LexisNexis, in particular, has invested heavily in reducing hallucination through architectural improvements.

But the trajectory matters less than the current state. Even if hallucination rates drop by 50% from Stanford's findings, you'd still be looking at 9-17% for Lexis+ AI and 20%+ for Westlaw AI. That's still too high for unverified use.

The honest timeline for legal AI accuracy that eliminates the need for verification is years away, not months. Any vendor telling you their tool is reliable enough to use without checking is selling you a liability, not a product. The firms that will thrive are the ones building verification into their workflow today -- not the ones waiting for the technology to become perfect.

The Bottom Line: Stanford's 2025 data shows Lexis+ AI hallucinating in 17-33% of queries (65% accurate) and Westlaw AI performing nearly twice as badly (42% accurate) -- making verification of every AI-generated citation non-optional.

AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.

Ai Hallucination Rate Legal Tools

Stanford 2025 Study: The Numbers in Detail

What the Vendors Suppress

Consumer Chatbots: Even Worse

What These Numbers Mean for Your Practice

The Accuracy Trajectory: Is It Getting Better?

Frequently Asked Questions

Related Across AI Vortex

Need help with AI infrastructure?

Stanford 2025 Study: The Numbers in Detail

What the Vendors Suppress

Consumer Chatbots: Even Worse

What These Numbers Mean for Your Practice

The Accuracy Trajectory: Is It Getting Better?

Frequently Asked Questions

More from Guides

Related Across AI Vortex

Need help with AI infrastructure?