The numbers the legal AI vendors don't want you to see: Stanford's 2025 study found that Lexis+ AI hallucinated in 17-33% of tested queries, while Westlaw AI performed nearly twice as badly. In absolute accuracy terms, LexisNexis produced correct results 65% of the time; Westlaw managed 42%. These aren't consumer chatbots -- these are the premium, enterprise-grade legal research platforms that Thomson Reuters and LexisNexis are selling to law firms as reliable AI research tools.
The vendors have pushed back on these findings. Thomson Reuters called the methodology flawed. LexisNexis pointed to improvements since the study was conducted. But neither company has published competing data with comparable rigor. Until they do, Stanford's numbers are the best empirical evidence available -- and they should shape how every attorney uses these tools.
Stanford 2025 Study: The Numbers in Detail
Stanford's study tested legal AI tools by submitting queries where the correct answers were known and could be objectively verified. This wasn't opinion-based evaluation -- it was factual accuracy testing against verifiable legal questions.
Lexis+ AI: Hallucination rate of 17-33% depending on query complexity. Overall accuracy rate of 65%. The tool performed best on straightforward citation lookups and worst on complex multi-step legal questions requiring synthesis of multiple sources.
Westlaw AI (CoCounsel): Hallucination rate nearly double that of Lexis+ AI on comparable queries. Overall accuracy rate of 42%. The tool struggled particularly with questions that required distinguishing between similar cases or applying holdings to novel fact patterns.
To put these numbers in context: if you run 10 legal research queries through Lexis+ AI, expect 2-3 to contain hallucinated information. Through Westlaw AI, expect 4-6. These aren't edge cases -- they're the normal operating parameters of tools marketed as reliable legal AI research platforms.
What the Vendors Suppress
Neither Thomson Reuters nor LexisNexis publishes their own hallucination rate data. Both companies market their AI tools with claims of accuracy and reliability, but neither provides the empirical evidence to support those claims.
Thomson Reuters' response to Stanford: Challenged the methodology, argued the queries weren't representative of typical use, and pointed to internal testing showing better results. But they didn't publish that internal testing data for independent verification.
LexisNexis' response: Acknowledged the study but emphasized that Lexis+ AI is continuously improving and that accuracy rates vary by query type. They pointed to their Retrieval Augmented Generation (RAG) architecture as a structural advantage -- and they're right that RAG reduces hallucination compared to pure generation, but "less hallucination than ChatGPT" is a low bar.
The absence of vendor-published accuracy data is itself informative. These companies have the resources to conduct and publish rigorous accuracy studies. They haven't. Draw your own conclusions about what that means.
Consumer Chatbots: Even Worse
If enterprise legal AI tools hallucinate at 17-42% rates, consumer chatbots are significantly worse for legal research. Independent testing of ChatGPT, Claude, and Gemini on legal questions consistently shows hallucination rates exceeding 30% -- and these tools will fabricate entire case citations, complete with realistic-sounding party names, docket numbers, and holdings.
The difference between enterprise legal AI and consumer chatbots is architectural. Lexis+ AI and CoCounsel are connected to actual legal databases through RAG systems -- they retrieve real documents before generating responses. Consumer chatbots generate text based on training data patterns with no connection to verified legal sources. They don't "look up" cases; they predict what a case citation should look like based on patterns.
Clearbrief stands apart from both categories. It uses a semantic approach that can't hallucinate citations because it only references documents you provide or that exist in verified databases. It doesn't generate legal analysis from scratch. This architectural difference is why Clearbrief has a fundamentally different reliability profile than generative AI tools.
What These Numbers Mean for Your Practice
A 17-42% hallucination rate means verification isn't optional -- it's mathematically necessary. Here's what the data demands:
Every AI-generated citation must be independently verified. Not spot-checked. Not sampled. Every single one. At a 17% hallucination rate, checking only half your citations leaves a 9% chance per query that a fabricated citation makes it into your filing. Over dozens of filings, that's a near-certainty of submitting fake law to a court.
Tool selection is a professional judgment decision. Lexis+ AI at 65% accuracy is measurably better than Westlaw AI at 42% accuracy. If you're choosing between them, the Stanford data gives you an empirical basis for that choice. But neither is reliable enough to use without verification.
AI should accelerate research, not replace it. The right workflow is: use AI to identify potentially relevant cases and arguments, then verify everything through traditional legal research. AI saves you the initial search time. Verification ensures you don't submit hallucinated results.
Document your verification. When courts see a 17-42% hallucination rate from the best legal AI tools, they'll expect attorneys to show they verified. A documented verification workflow is your best defense.
The Accuracy Trajectory: Is It Getting Better?
Both vendors claim their tools are improving, and there's reason to believe that's true. RAG architectures are getting better at retrieving relevant documents, and generation models are getting better at synthesizing retrieved information accurately. LexisNexis, in particular, has invested heavily in reducing hallucination through architectural improvements.
But the trajectory matters less than the current state. Even if hallucination rates drop by 50% from Stanford's findings, you'd still be looking at 9-17% for Lexis+ AI and 20%+ for Westlaw AI. That's still too high for unverified use.
The honest timeline for legal AI accuracy that eliminates the need for verification is years away, not months. Any vendor telling you their tool is reliable enough to use without checking is selling you a liability, not a product. The firms that will thrive are the ones building verification into their workflow today -- not the ones waiting for the technology to become perfect.
The Bottom Line: Stanford's 2025 data shows Lexis+ AI hallucinating in 17-33% of queries (65% accurate) and Westlaw AI performing nearly twice as badly (42% accurate) -- making verification of every AI-generated citation non-optional.
AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.
