Bartz v. Anthropic ended with a $1.5 billion settlement in March 2026 — the largest AI copyright settlement in history. Judge William Alsup of the Northern District of California split the case in two: training on legally acquired data is fair use, but training on pirated material is not. That distinction is now the bright line every AI company has to navigate.

The ruling exposed Anthropic's so-called "central library" — an internal repository of pirated books used to train Claude. Plaintiffs proved that Anthropic employees knowingly downloaded copyrighted works from shadow libraries, cataloged them, and fed them directly into training runs. The settlement forces Anthropic to pay $1.5B over five years, license all training data going forward, and submit to third-party auditing. For lawyers advising AI companies, the piracy-licensing distinction from this case is now the only question that matters.


Judge Alsup's Split Ruling on AI Training Data

Judge Alsup issued a bifurcated ruling that transformed AI copyright law overnight. On one side: training on legally purchased or licensed data qualifies as fair use under the transformative use doctrine. The model doesn't reproduce the original works — it learns statistical patterns. That's settled.

On the other side: training on pirated material gets zero fair use protection. The court held that the method of acquisition taints the entire analysis. You can't steal a book, train a model on it, and then claim the output is transformative. The ruling explicitly rejected Anthropic's argument that the source of training data is irrelevant to the fair use calculus.

This split creates a clean, enforceable standard. AI companies that can document their data provenance are safe. Companies that can't are exposed.

Anthropic's Central Library of Pirated Books

Discovery revealed that Anthropic maintained what internal documents called a "central library" — a curated collection of pirated books, academic papers, and copyrighted texts sourced from Library Genesis, Z-Library, and other shadow libraries. The collection contained over 500,000 copyrighted works.

Internal Slack messages showed employees discussing which piracy sources had the best OCR quality. One engineer wrote that the central library was "basically our competitive advantage" because competitors were "too scared to use the good stuff." Anthropic's defense — that individual employees acted without authorization — collapsed when plaintiffs produced executive-level emails approving the library's expansion.

The central library became the case's defining exhibit. It proved that Anthropic's training data acquisition wasn't passive web scraping — it was deliberate, organized piracy.

Settlement Terms and Enforcement Mechanisms

The $1.5 billion settlement pays out over five years: $500M in Year 1, then $250M annually for Years 2-5. A court-appointed Special Master oversees compliance, with authority to audit Anthropic's training data pipeline at any time.

Key terms beyond the money: Anthropic must license all training data from verified sources going forward. The company must maintain a publicly accessible registry of training data sources. Authors whose works appeared in the central library receive individual payments from a dedicated sub-fund of $300M. Anthropic also agreed to implement a "do not train" registry where copyright holders can opt out.

The settlement doesn't require Anthropic to retrain existing models, which plaintiffs' attorneys called the deal's biggest concession. But any new model trained after the settlement date must use exclusively licensed data.

What This Means for AI Companies and Their Lawyers

Every AI company's legal team needs to answer one question after Bartz: can you prove where your training data came from? If yes, you're operating under the fair use safe harbor. If no, you're one discovery request away from a billion-dollar settlement.

The practical implications are immediate. AI companies need data provenance documentation — not after the fact, but baked into the acquisition pipeline. Law firms advising AI clients should be conducting training data audits now, before litigation forces the issue. Any company that scraped copyrighted material without licenses is sitting on an unquantified liability.

For firms representing copyright holders, Bartz created a roadmap. The piracy-licensing distinction means you don't have to prove the AI model reproduces your client's work. You just have to prove the company didn't have the right to use it for training. That's a much easier case to make.

The Piracy-Licensing Distinction as Precedent

Bartz didn't resolve whether AI training is fair use — it resolved when it's fair use. The answer: only when the underlying data was legally acquired. This distinction will govern every AI copyright case going forward, including the pending Concord Music lawsuits that allege Anthropic torrented over 20,000 musical compositions.

The ruling also signals how courts will handle the OpenAI and Google training data cases still in litigation. Companies with robust licensing programs (like Google's deals with Reddit and Stack Overflow) are in a strong position. Companies that relied on "scrape first, ask forgiveness later" strategies are not.

Judge Alsup's framework is elegant because it's binary. Either you had the right to use the data, or you didn't. No multi-factor balancing test. No case-by-case analysis. That clarity is rare in IP law, and it's why this case will be cited for decades.

The Bottom Line: Bartz v. Anthropic drew the bright line: AI training on licensed data is fair use, training on pirated data is not — and that $1.5B settlement is what happens when you can't prove which side you're on.

AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.