Which legal AI platform is more accurate, Lexis+ AI or Westlaw AI?

Stanford's 2025 study found Lexis+ AI significantly outperforms Westlaw AI. LexisNexis achieved 65% accuracy on verified queries versus Westlaw's 42%. Lexis+ AI's hallucination rate was 17-33% depending on query complexity, while Westlaw's was nearly twice as bad. Neither platform is accurate enough to use without verification, but Lexis+ AI is currently the more reliable option.

What types of hallucinations do legal AI tools produce?

Legal AI hallucinations fall into five categories: completely fabricated cases with plausible citations, real case names with incorrect holdings, accurate cases cited for propositions they don't support, fabricated quotes attributed to real opinions, and accurate components synthesized into incorrect legal conclusions. The most dangerous type is real cases with wrong holdings — they pass a basic citation check but state the law incorrectly.

Does RAG technology prevent AI hallucinations in legal research?

No. RAG (retrieval-augmented generation) reduces hallucination rates compared to base language models, but the Stanford data shows it doesn't eliminate them. RAG systems can retrieve irrelevant documents, mischaracterize retrieved content, or synthesize accurate fragments into inaccurate conclusions. A 17-33% hallucination rate on a RAG-based system means the technology isn't a solution — it's an improvement that still fails too often for legal work.

Are these hallucination rates getting better over time?

Incrementally, yes. Earlier studies showed higher hallucination rates for both platforms. But improvement is slow. The fundamental architecture of large language models makes zero-hallucination unlikely in the near term. Vendors are adding citation verification layers and confidence scoring, but the Stanford 2025 numbers represent current state-of-the-art performance. Don't assume the next update will fix the problem — plan for current accuracy levels.

Should law firms stop using AI legal research tools?

No — but firms should stop trusting them without verification. AI research tools are useful as starting points that accelerate the research process. The problem isn't using them; it's using them as final answers. Build mandatory verification into your workflow, track your firm's actual hallucination encounters, and train every AI user on what to check. The tool is valuable if you treat it as a draft, not an authority.

RAG Hallucination Rates in Legal AI

RAG hallucination rates in legal AI are still high enough to create real professional risk. Stanford's 2025 study on legal AI accuracy dropped numbers that should alarm every managing partner relying on AI-assisted research. Lexis+ AI hallucinated on 17-33% of queries. Westlaw AI performed nearly twice as worse. These aren't edge cases or adversarial prompts — they're standard legal research questions asked the way a practicing lawyer would ask them.

The data vendors don't advertise these numbers. LexisNexis achieved 65% accuracy on verified legal queries. Westlaw managed 42%. That means for every three research queries you run through Westlaw's AI, fewer than two come back fully accurate. If an associate performed at that level, you'd fire them. But firms are trusting these tools with client matters every day without understanding the failure rate.

What Stanford Actually Tested and Found

The Stanford study evaluated retrieval-augmented generation (RAG) systems — the architecture both Lexis+ AI and Westlaw AI use. RAG is supposed to solve hallucinations by grounding AI responses in actual legal databases. The study tested whether it actually does.

Researchers submitted hundreds of legal research queries across practice areas and verified every citation, quote, and legal proposition in the outputs. Lexis+ AI's hallucination rate ranged from 17% on straightforward queries to 33% on complex multi-issue questions. Westlaw AI's numbers were consistently worse, with accuracy at 42% compared to Lexis's 65%. The hallucinations weren't random gibberish — they were plausible-sounding citations to real courts with fabricated case names, or real case names with fabricated holdings. That's the dangerous kind.

Why RAG Doesn't Fix the Problem

Legal AI vendors sold RAG as the solution to hallucinations. The pitch: "Our AI is grounded in actual legal databases, so it can't make things up." The Stanford data proves that's marketing, not reality.

RAG reduces hallucinations but doesn't eliminate them. The retrieval step can pull irrelevant documents. The generation step can mischaracterize what it retrieved. The synthesis step can combine accurate fragments into inaccurate conclusions. When a RAG system retrieves a real case but summarizes the holding incorrectly, that's worse than an obvious fabrication — because the citation checks out but the law is wrong. A lawyer who verifies the case exists but doesn't read the actual opinion will miss the error entirely.

What the Vendors Don't Tell You

Neither LexisNexis nor Westlaw publish their own accuracy benchmarks in a way that allows independent verification. When pressed on the Stanford numbers, both vendors pointed to internal testing that supposedly shows higher accuracy rates. They haven't released that data for peer review.

Here's what the sales reps won't mention: accuracy varies dramatically by practice area. Complex regulatory questions, multi-jurisdictional issues, and recent case law perform worst. The tools are most accurate on well-established, frequently cited propositions — exactly the research you least need AI help with. The harder the question, the less you can trust the answer. Both platforms also perform worse on state law than federal law, which matters for the majority of practicing lawyers who work primarily in state courts.

The Real Cost of a 1-in-3 Failure Rate

A 17-33% hallucination rate doesn't just risk sanctions. It compounds across a practice. If your firm runs 100 AI-assisted research queries per week and one-third contain some form of hallucination, that's 30+ potentially flawed research memos hitting partner desks every week.

Not all hallucinations lead to filed documents. But they waste associate time chasing phantom authorities, create false confidence in legal positions, and occasionally make it into briefs. The Portland attorney who paid $109,700 in sanctions relied on AI output he didn't verify. The Mata v. Avianca lawyers trusted their tool. At a 1-in-3 failure rate, the question isn't whether your firm will file a hallucinated citation — it's when. And the malpractice implications extend beyond sanctions to client harm on matters where flawed research shaped case strategy.

What Lawyers Should Actually Verify

Given these accuracy rates, every AI research output requires verification, but the verification needs to be targeted. Check these five things on every AI research response:

1. Case existence: Confirm every cited case exists in an actual reporter. Don't just search the case name — verify the citation format, court, and year. 2. Holding accuracy: Read the actual opinion. AI frequently gets the court right but the holding wrong, sometimes stating the opposite of what the court held. 3. Current status: Check that cited cases haven't been overruled, distinguished, or superseded. AI training data has cutoff dates; Shepardize everything. 4. Quotation accuracy: If the AI puts text in quotation marks, verify word-for-word against the source. Fabricated quotes from real cases are the most common hallucination type in RAG systems. 5. Logical synthesis: Even when individual citations are accurate, the AI may combine them into a legal argument the cases don't actually support. Verify the reasoning chain, not just the components.

What RAG Hallucination Rates Mean for Law Firm Policy

A firm does not need to ban legal AI because hallucination rates are high. It does need a policy that reflects those rates honestly.

If a RAG-based system still fails on a meaningful fraction of legal queries, then the policy cannot treat AI outputs as presumptively reliable research. It has to require verification, attribution discipline, and supervision. That is the practical governance implication of the Stanford numbers.

The Bottom Line: Your legal AI vendor is selling you a tool with a documented 1-in-3 failure rate and calling it innovation. The Stanford data is clear: no legal AI platform is reliable enough to use without full human verification on every output. Treat AI research as a first draft from an unreliable summer clerk, not a finished work product.

AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.

RAG Hallucination Rates in Legal AI

What Stanford Actually Tested and Found

Why RAG Doesn't Fix the Problem

What the Vendors Don't Tell You

The Real Cost of a 1-in-3 Failure Rate

What Lawyers Should Actually Verify

What RAG Hallucination Rates Mean for Law Firm Policy

Frequently Asked Questions

Related Across AI Vortex

Need help with AI infrastructure?

What Stanford Actually Tested and Found

Why RAG Doesn't Fix the Problem

What the Vendors Don't Tell You

The Real Cost of a 1-in-3 Failure Rate

What Lawyers Should Actually Verify

What RAG Hallucination Rates Mean for Law Firm Policy

Frequently Asked Questions

More from AI Governance

Related Across AI Vortex

Need help with AI infrastructure?