An AI audit for law firms evaluates every AI tool in use across four dimensions: data handling, output accuracy, bias risk, and regulatory compliance. If your firm uses any AI tools — and it almost certainly does, whether officially or not — an annual audit is the minimum standard for managing risk.

Most firms skip auditing entirely because they don't know where to start. The framework isn't complicated: inventory what's being used, test whether it works correctly, verify where client data goes, and document everything. The firms that audit their AI tools catch problems before they become sanctions, malpractice claims, or bar complaints. The firms that don't audit are gambling that nothing goes wrong.


Step 1: AI Tool Inventory — What's Actually Being Used

You can't audit what you haven't inventoried. The first step is identifying every AI tool in use across the firm — not just the ones IT approved.

Formal tools: Enterprise AI platforms the firm has purchased (Harvey, CoCounsel, Lexis+ AI, Everlaw). Practice management software with AI features (Clio, MyCase, Smokeball). Document management AI (iManage, NetDocuments). E-discovery platforms with AI review capabilities.

Embedded AI: Microsoft Copilot in Office 365. Gmail's AI features. Grammarly and similar writing assistants. PDF tools with AI summarization. These tools are so embedded in daily workflows that attorneys forget they're AI.

Shadow tools: Consumer AI platforms used without firm approval — ChatGPT, Claude, Gemini, Perplexity. Run an anonymous survey to uncover these. Assume they exist even if the survey comes back clean.

For each tool, document: Tool name and vendor. Version/tier (consumer vs. enterprise). Who uses it and for what purposes. What data types enter the tool. What confidentiality agreements exist. Who approved the tool and when.

Step 2: Data Handling Audit — Where Client Information Goes

For every AI tool on your inventory, answer these questions:

Does the vendor train on input data? Check the terms of service, not the marketing page. Consumer tiers of ChatGPT, Claude, and Gemini may use inputs for training unless opted out. Enterprise tiers contractually prohibit training. If the answer isn't in a signed agreement, assume the worst.

Where is data stored and for how long? Some tools process data in real-time and don't retain it. Others store conversation histories, document uploads, and outputs for weeks or indefinitely. Data retention policies should be documented and aligned with your firm's records management.

Who can access the data? The vendor's employees? Subcontractors? Third-party infrastructure providers? Enterprise tools should have access controls documented in the data protection agreement. Consumer tools typically give the vendor broad access rights.

What happens to data if the vendor is breached? Review the vendor's incident response obligations. Do they notify you? Within what timeframe? What remediation do they provide? A vendor without a documented breach notification process is a vendor that shouldn't have your client data.

Compliance certifications: SOC 2 Type II is the minimum for enterprise AI tools handling legal data. ISO 27001 is better. Ask for the most recent audit report, not just the certification badge on the website.

Step 3: Accuracy Testing — Does the Tool Actually Work

AI tools in legal practice need to be accurate. Testing accuracy isn't optional — it's an ongoing obligation under Rule 1.1's competence requirement.

Citation accuracy testing. Run 20 legal research queries across different practice areas and jurisdictions. Check every citation against Westlaw or Lexis. Track the hallucination rate. If the tool generates fabricated citations more than 5% of the time, it shouldn't be used for citation-dependent work without mandatory verification workflows.

Analysis accuracy testing. Present the tool with legal questions where you know the correct answer. Compare the AI's analysis to the verified correct analysis. Track where it gets things wrong — which practice areas, which jurisdictions, which types of questions. This builds an internal reliability profile for each tool.

Document review accuracy testing. For tools used in contract review or e-discovery, run them against a set of documents with known relevant and irrelevant items. Measure precision (how many flagged items were actually relevant) and recall (how many relevant items were flagged). Compare against human review baselines.

Temporal accuracy testing. Ask the tool about recent legal developments — new statutes, recent opinions, regulatory changes. AI tools have knowledge cutoff dates, and outdated information in a filing is as dangerous as fabricated information. Document each tool's effective knowledge cutoff.

Step 4: Bias and Fairness Assessment

Legal AI can embed and amplify bias. Mobley v. Workday (N.D. Cal.) demonstrated that AI tools used in employment decisions can produce discriminatory outcomes. The same risk applies to legal AI used in case assessment, risk scoring, and client intake.

Where bias appears in legal AI:

Contract analysis tools may apply different risk thresholds based on patterns in training data that correlate with protected characteristics. Litigation prediction tools may undervalue cases from certain jurisdictions or involving certain demographics. Document review AI may flag or miss documents based on linguistic patterns that correlate with race, gender, or national origin.

How to test for bias:

Run the same legal question with different fact patterns that vary only by characteristics that shouldn't affect the legal analysis (party names suggesting different demographics, different geographic locations, different firm sizes). Compare results. Significant variation on legally irrelevant characteristics indicates bias.

The Colorado AI Act (effective June 2026) will require deployers of high-risk AI systems to conduct impact assessments, including bias testing. Law firms using AI for decisions that affect client outcomes may fall under this requirement. Even if your jurisdiction doesn't mandate bias testing, the practice standard is moving toward it.

Step 5: Annual Audit Framework and Vendor Accountability

Turn these steps into a repeatable annual process:

Q1: Inventory update. Refresh the tool inventory. Identify new tools adopted since last audit. Survey attorneys for shadow AI usage. Update vendor agreements.

Q2: Data handling review. Verify vendor compliance with data protection agreements. Request updated SOC 2 reports. Confirm data retention policies haven't changed. Review any vendor security incidents from the past year.

Q3: Accuracy testing. Run accuracy benchmarks across all tools. Compare results to prior year. Identify any degradation in performance. Update internal reliability profiles.

Q4: Compliance and policy review. Update the firm's AI policy for new regulatory requirements. Review court disclosure rules adopted in the past year. Update CLE requirements for attorney AI competence. Report findings to firm leadership.

Vendor accountability measures:

Include audit rights in every AI vendor contract. Require annual security attestations. Define performance benchmarks with consequences for degradation. Maintain the right to terminate without penalty if the vendor fails to meet data protection commitments. The vendor that won't agree to audit provisions is the vendor that has something to hide.

The Bottom Line: An annual AI audit covering tool inventory, data handling, accuracy testing, bias assessment, and vendor compliance isn't bureaucratic overhead — it's the evidence that your firm meets its competence and supervision obligations under the Model Rules.

AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.