Is predictive coding the same as TAR?

Effectively, yes. Predictive coding and technology-assisted review (TAR) are used interchangeably in legal practice. Technically, TAR is the broader category and predictive coding is the specific technique, but courts and practitioners treat them as synonymous. TAR 1.0 refers to seed-set predictive coding; TAR 2.0 refers to continuous active learning.

How accurate is predictive coding compared to human review?

Predictive coding achieves 75-85% recall rates in NIST TREC studies, while human-only review averages 55-65%. The machine is more consistent — it doesn't suffer from fatigue, distraction, or inconsistent coding decisions across reviewers. For precision (avoiding false positives), performance is comparable, with predictive coding slightly ahead in most studies.

Do courts require predictive coding?

Courts don't universally require it, but the trend is moving in that direction. In In re Broiler Chicken Antitrust Litigation (2018), the court ordered TAR over objections. Multiple judges have indicated that insisting on manual review of large collections when TAR is available may be unreasonable. For datasets over 500,000 documents, courts increasingly expect parties to at least consider TAR.

How much does predictive coding cost?

Predictive coding platforms typically cost $0.10-0.50 per document reviewed, compared to $1.50-3.00 per document for human contract reviewers. For a 1 million document collection, that's roughly $100K-500K with TAR vs. $1.5M-3M with human review. The savings increase dramatically with collection size. Most e-discovery vendors (Relativity, Reveal, Everlaw) include TAR in their platform pricing.

What is the difference between TAR 1.0 and TAR 2.0?

TAR 1.0 trains on a fixed seed set of documents and applies that model to the full collection — it's a one-shot approach. TAR 2.0 (Continuous Active Learning) continuously updates the model as reviewers code documents, prioritizing the most uncertain documents for human review. TAR 2.0 is more accurate, more efficient, and is now the industry standard. Most e-discovery platforms default to TAR 2.0.

What Is Predictive Coding? TAR Explained

Predictive coding is a machine learning technique used in e-discovery that trains an algorithm to classify documents as relevant or irrelevant based on a human reviewer's decisions. Instead of having attorneys review every document in a massive dataset, predictive coding lets the machine learn what "relevant" looks like and apply that judgment at scale.

Also called technology-assisted review (TAR), predictive coding has been court-accepted since 2012 (Judge Peck's landmark ruling in *Da Silva Moore v. Publicis Groupe*) and is now the standard for large-scale document review. Studies consistently show it's more accurate than human-only review — not just faster.

How Predictive Coding Works

The process follows a train-and-apply model. Step 1: A senior attorney reviews a "seed set" of documents (typically 1,000-2,000), coding each as relevant or not. Step 2: The algorithm learns patterns from those decisions — not just keywords, but contextual relationships between terms, document types, and metadata. Step 3: The algorithm scores every document in the collection on a relevance scale (usually 0-100). Step 4: Attorneys set a cutoff threshold and review documents above it, while those below are designated as non-relevant. Step 5: Quality control sampling validates the algorithm's decisions. The entire cycle can be iterated — each round of human correction makes the algorithm more accurate.

Accuracy vs. Human Review

This is where managing partners need to pay attention. The TREC Legal Track studies (run by NIST) consistently show predictive coding achieves recall rates of 75-85%, while manual human review averages 55-65% recall. That means predictive coding finds more relevant documents than teams of contract reviewers. The reason is human fatigue. After 8 hours of document review, human accuracy drops dramatically. Algorithms don't get tired. They also don't get distracted by irrelevant but interesting documents. The cost difference is equally dramatic: predictive coding can review a million documents for $0.10-0.50 per document, compared to $1.50-3.00 for human review.

Court Acceptance and Legal Standards

Predictive coding has been court-approved for over a decade. Key rulings: Da Silva Moore v. Publicis Groupe (2012) — first federal court approval of predictive coding. Rio Tinto v. Vale (2015) — Judge Peck held that TAR is "now clearly accepted" and may be more accurate than manual review. In re Broiler Chicken Antitrust Litigation (2018) — court ordered the use of TAR over the opposing party's objection. Livingston v. City of Chicago (2022) — court required disclosure of TAR methodology and seed set composition. The trend is clear: courts don't just allow predictive coding — they're increasingly expecting it for large-scale reviews. Insisting on manual review of 5 million documents when TAR is available may itself be seen as unreasonable.

TAR 1.0 vs. TAR 2.0 (Continuous Active Learning)

TAR 1.0 uses the seed-set approach: train on a fixed set of documents, then apply to the full collection. It's effective but static — the model doesn't improve after initial training. TAR 2.0 (Continuous Active Learning or CAL) is the current standard. Instead of a single training round, the algorithm continuously learns from every document a reviewer codes. It prioritizes the most informative documents for human review — the ones where the algorithm is least certain. This feedback loop means the model improves throughout the review, and the reviewer's time is spent on the most impactful documents. TAR 2.0 consistently outperforms TAR 1.0 in both accuracy and efficiency.

When to Use Predictive Coding

Predictive coding makes economic sense when you're reviewing 50,000+ documents. Below that threshold, the setup cost and training time may not justify the investment over linear review. The sweet spot is 250,000 to 10 million documents — datasets large enough that manual review is prohibitively expensive but structured enough for effective training. It's standard practice in antitrust litigation, securities fraud, mass tort, and government investigations where document collections routinely exceed millions. For smaller matters, keyword search with targeted manual review remains more cost-effective. The key metric is the richness of the collection — the percentage of relevant documents. Collections with 1-5% richness benefit most from predictive coding.

The Bottom Line: Predictive coding uses machine learning to classify documents in e-discovery, achieving 75-85% recall vs. 55-65% for human reviewers. Court-accepted since 2012 and now expected for large-scale reviews. TAR 2.0 (continuous active learning) is the current standard. If you're reviewing 50,000+ documents manually, you're overspending and underperforming.

AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.

What Is Predictive Coding? TAR Explained

How Predictive Coding Works

Accuracy vs. Human Review

Court Acceptance and Legal Standards

TAR 1.0 vs. TAR 2.0 (Continuous Active Learning)

When to Use Predictive Coding

Frequently Asked Questions

Related Across AI Vortex

Need help with AI infrastructure?

How Predictive Coding Works

Accuracy vs. Human Review

Court Acceptance and Legal Standards

TAR 1.0 vs. TAR 2.0 (Continuous Active Learning)

When to Use Predictive Coding

Frequently Asked Questions

More from Guides

Related Across AI Vortex

Need help with AI infrastructure?