What's the difference between AI and automation?

Automation follows predefined rules — if X, then Y. It's predictable and consistent. AI processes information and generates output that isn't predetermined — it can handle novel inputs and produce novel outputs. Document automation (filling templates from data) is automation. Document analysis (reading a contract and identifying risks) is AI. Many 'AI' products are actually automation with AI marketing.

What does 'AI-powered' actually mean on a legal tech product?

It could mean anything from a GPT-4 API integration to a keyword search algorithm. Always ask: which AI model does it use? Is it fine-tuned for legal work? What specific tasks does AI perform vs. traditional software? 'AI-powered' has become a marketing term — the specifics determine whether it's meaningful.

Why do AI tools hallucinate and can it be fixed?

LLMs generate text by predicting the most likely next word based on patterns in training data. When they encounter gaps — questions about specific cases, recent events, or niche legal issues — they fill those gaps with plausible-sounding but fabricated content. Hallucination can be reduced through RAG (grounding in real data), better training, and constitutional AI approaches, but it can't be eliminated entirely because it's inherent to how language models work.

What's the difference between Claude, GPT-4, and Gemini for legal work?

Claude (Anthropic): best legal reasoning accuracy, fewest hallucinations on legal tasks, 200K context window. Constitutional AI training produces more cautious, ethically-aware output. GPT-4 (OpenAI): broadest ecosystem, Microsoft integration, 128K context window. Higher hallucination rate on legal citations. Gemini (Google): largest context window (1M tokens), Google Workspace integration, but weakest legal-specific performance. For substantive legal work, Claude leads. For productivity, Copilot/GPT-4 or Gemini depending on your stack.

What does 'token' mean and why should I care?

A token is roughly 3/4 of a word. AI models process text in tokens, and pricing/limits are expressed in tokens. A 200K token context window can handle roughly 150,000 words (about 300 pages of legal text). API pricing is per-token — typically $3-$15 per million input tokens for premium models. Understanding tokens helps you estimate costs and understand when your document exceeds a model's processing capacity.

AI Legal Glossary 2026

Half the conversations about legal AI fail because people use the same words to mean different things. When a vendor says 'AI-powered,' they might mean a GPT-4 integration, a rules-based automation, or a fancy search algorithm. When a managing partner says 'hallucination,' they might mean any AI error — not the specific phenomenon of fabricated citations.

This glossary defines 50+ legal AI terms with precision, in plain language, with legal-practice context. Bookmark it. Share it with your technology committee. Reference it before your next vendor demo so you can ask questions that actually mean something.

Core AI Concepts

Artificial Intelligence (AI): Software that performs tasks typically requiring human intelligence — reasoning, pattern recognition, language understanding. In legal, this ranges from simple document automation to complex case analysis. Not all legal tech is AI; most is traditional software with AI marketing.

Large Language Model (LLM): The foundation of modern legal AI. GPT-4, Claude, Gemini, and Llama are LLMs trained on massive text datasets. They predict the next word in a sequence, producing human-like text. Legal AI tools are mostly LLM-powered interfaces with legal-specific features on top.

Generative AI (GenAI): AI that creates new content — text, images, code. In legal, this means AI that drafts documents, generates research memos, and produces analysis. Distinct from analytical AI, which classifies or scores existing content.

Natural Language Processing (NLP): AI's ability to understand and generate human language. Every legal AI tool that reads documents, answers questions, or drafts text uses NLP. Older NLP (keyword matching, rules-based) is different from modern LLM-powered NLP.

Machine Learning (ML): AI that improves through data exposure rather than explicit programming. Predictive coding in e-discovery is ML — the system learns what's relevant from reviewer decisions. LLMs are a type of ML.

Foundation Model: A large AI model trained on broad data that serves as the base for specialized applications. Claude, GPT-4, and Gemini are foundation models. Harvey, CoCounsel, and other legal AI tools are applications built on foundation models.

Fine-Tuning: Adapting a foundation model for a specific domain by training it on additional domain-specific data. Some legal AI tools fine-tune foundation models on legal text to improve legal reasoning and reduce hallucinations.

Legal AI-Specific Terms

Hallucination: When AI generates content that sounds plausible but is factually wrong — fabricated case citations, invented statutes, or made-up holdings. The primary risk in legal AI. Not a bug that can be fully fixed; it's a fundamental characteristic of how LLMs work.

Retrieval-Augmented Generation (RAG): Combining LLM reasoning with real-time database retrieval. Instead of generating answers from training data (which can hallucinate), RAG systems pull actual documents from databases and generate answers based on retrieved content. CoCounsel's Westlaw integration is RAG — it retrieves real cases before generating analysis.

Predictive Coding / Technology-Assisted Review (TAR): AI-assisted document review in e-discovery. The AI learns from human reviewer decisions and predicts which documents are relevant, privileged, or responsive. Court-approved since 2012 (Da Silva Moore v. Publicis Groupe).

Continuous Active Learning (CAL): An advanced form of predictive coding where the AI continuously updates its predictions as reviewers code documents, rather than training once on a seed set. The current standard for AI-assisted document review.

AI Disclosure / AI Certification: Court-required statements in filings certifying whether and how AI was used in preparing the filing. As of 2026, 300+ federal judges require some form of AI disclosure. Requirements vary from simple certifications to detailed descriptions of AI tools and verification steps.

Prompt Engineering: Crafting effective instructions for AI tools to produce useful output. In legal practice, this means structuring queries with jurisdiction, applicable standards, relevant facts, and desired output format. Good prompt engineering is the difference between useful AI output and garbage.

Context Window: The amount of text an AI can process in a single interaction. Measured in tokens (roughly 0.75 words per token). Claude: 200K tokens (~150K words). Gemini: 1M tokens. GPT-4: 128K tokens. Larger context windows allow analysis of longer documents without chunking.

Data and Security Terms

Zero Data Retention: Vendor policy where user inputs and AI outputs are not stored after the session ends. The gold standard for legal AI data handling. Enterprise versions of Claude, ChatGPT, and most legal-specific tools offer this.

Data Training Opt-Out / No-Training Commitment: Guarantee that the vendor won't use your inputs to train or improve their AI models. Critical for maintaining client confidentiality. Consumer-tier AI tools often train on user data; enterprise tiers typically don't.

SOC 2 Type II: Security certification demonstrating that a vendor's controls for data security, availability, processing integrity, confidentiality, and privacy have been tested and verified over a period (usually 12 months). The minimum security certification law firms should require from AI vendors.

Data Residency: The physical location where data is processed and stored. Relevant for GDPR compliance (EU data must stay in the EU unless adequate protections exist), government matters, and cross-border confidentiality. Specify data residency requirements in AI vendor contracts.

On-Premise Deployment: Running AI models on your firm's own servers rather than in the vendor's cloud. Eliminates data-sharing concerns but requires significant IT infrastructure. Available for open-source models (Llama, Mistral) and some enterprise AI tools.

API (Application Programming Interface): The technical interface for accessing AI models programmatically. Firms with development capability use APIs to build custom AI tools. API access typically offers more control over data handling than consumer interfaces.

Tokenization: The process of breaking text into 'tokens' that AI models process. Important for pricing (API costs are per-token) and for understanding context window limits. Legal documents are typically token-dense due to specialized vocabulary.

Ethics and Regulatory Terms

Model Rule 1.1 (Competence): ABA Model Rule requiring lawyers to provide competent representation, including staying current with technology. Increasingly interpreted to require understanding AI tools relevant to your practice. The ethical foundation for legal AI adoption.

Model Rule 5.3 (Supervisory Responsibility): Requires lawyers to supervise non-lawyer assistants, increasingly interpreted to include AI tools. You're responsible for AI output the same way you're responsible for paralegal work.

Unauthorized Practice of Law (UPL): Providing legal services without a license. AI tools that provide legal advice directly to consumers raise UPL concerns. AI tools used by licensed attorneys as assistants don't — the attorney is the one practicing law.

Algorithmic Bias: Systematic errors in AI output that reflect biases in training data. In legal AI, this can manifest as biased risk assessments, sentencing recommendations, or hiring algorithms. Relevant to both AI use and AI regulation.

Explainability / Interpretability: The ability to understand why an AI reached a particular conclusion. Important for legal AI because attorneys must be able to explain their reasoning to courts and clients. LLMs are notoriously non-explainable — they produce answers but can't fully explain their reasoning process.

AI Governance Framework: A firm's comprehensive structure for managing AI use — including policies, procedures, training, oversight, and accountability. More comprehensive than an AI policy alone. Leading firms have dedicated AI governance committees.

Constitutional AI (CAI): Anthropic's training approach for Claude, where the model is trained to follow principles (be helpful, harmless, honest) rather than just pattern-match on training data. Results in AI that's more cautious and ethically aware — relevant for legal applications where caution is a feature.

Business and Market Terms

Legal Tech vs. Legal AI: Legal tech encompasses all technology in legal practice — practice management, e-billing, document management. Legal AI is the subset using artificial intelligence — LLMs, machine learning, NLP. Not all legal tech is AI; much of what's marketed as AI is traditional automation.

Vertical AI: AI built for a specific industry, like legal AI. Harvey and CoCounsel are vertical AI. Claude and GPT-4 are horizontal (general-purpose) AI used in vertical applications.

AI Wrapper: A product that adds a user interface and features on top of a foundation model's API without significant proprietary technology. Some legal AI products are sophisticated wrappers around Claude or GPT-4 with legal prompts and integrations. Not necessarily bad — but evaluate whether the wrapper adds enough value to justify the premium over using the foundation model directly.

Total Cost of Ownership (TCO): The full cost of an AI tool including licensing, training, integration, support, and productivity impact. A $500/month tool that saves 20 hours/month has lower TCO than a free tool that saves 2 hours/month. Evaluate AI tools on TCO, not sticker price.

Time-to-Value: How quickly an AI tool delivers measurable benefits. Tools with high time-to-value (immediate use, minimal training) are better for boutique and mid-size firms. Tools requiring extensive customization and training (low time-to-value) may be better for large firms with implementation resources.

Vendor Lock-In: Dependency on a specific AI vendor that makes switching costly. Prompt libraries built for Claude don't transfer perfectly to GPT-4. Custom integrations with Harvey don't work with CoCounsel. Build workflows and knowledge assets that are as model-agnostic as possible.

The Bottom Line: Knowing the vocabulary doesn't make you an AI expert — but it prevents you from being misled by vendors, confused by colleagues, or blindsided by court requirements. This glossary covers the terms you'll encounter in every AI vendor demo, ethics CLE, and technology committee meeting. Bookmark it and reference it when someone uses a term you're not sure about. The managing partner who can distinguish RAG from fine-tuning makes better purchasing decisions than the one who nods along.

AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.

Ai Legal Glossary 2026

Core AI Concepts

Legal AI-Specific Terms

Data and Security Terms

Ethics and Regulatory Terms

Business and Market Terms

Frequently Asked Questions

Related Across AI Vortex

Need help with AI infrastructure?

Core AI Concepts

Legal AI-Specific Terms

Data and Security Terms

Ethics and Regulatory Terms

Business and Market Terms

Frequently Asked Questions

More from Guides

Related Across AI Vortex

Need help with AI infrastructure?