Doe v. GitHub is the first class action challenging AI code generation tools. Programmers sued GitHub, Microsoft, and OpenAI, alleging that GitHub Copilot was trained on their open-source code without complying with license terms. Most claims were dismissed, but the surviving contract and license claims keep alive the question of whether AI coding tools must respect open-source licenses.


Background

GitHub Copilot launched in June 2022 as an AI-powered coding assistant built on OpenAI's Codex model. It was trained on publicly available code from GitHub repositories, including millions of projects licensed under open-source licenses like MIT, GPL, and Apache. These licenses allow free use of the code but require specific conditions: attribution, license inclusion, and in some cases (GPL), sharing derivative works under the same license.

Copilot reproduces code snippets from its training data, sometimes generating blocks that closely match or are identical to existing open-source code. When it does, it typically doesn't include the attribution or license notices required by the original code's license terms. Developers noticed their code appearing in Copilot suggestions without credit.

Anonymous plaintiff 'Doe' and other programmers filed suit in the Northern District of California on November 3, 2022, under Case No. 4:22-cv-06823. The complaint alleged copyright infringement, DMCA violations, breach of open-source license terms, and other claims against GitHub (owned by Microsoft), Microsoft, and OpenAI.

Doe v. GitHub, Inc.
No. 4:22-cv-06823 (N.D. Cal.)
Court
U.S. District Court, Northern District of Californ
Date
2022-11-03
Category
AI Liability / Copyright
Sanctions
None
AI Case Law — Updated April 2026

What Happened

The defendants moved to dismiss all 22 claims. Judge Jon S. Tigar largely sided with the defense, dismissing 20 of the 22 claims. The biggest loss for the plaintiffs was the DMCA claim. The court found that the DMCA's provisions on removing copyright management information (CMI) don't straightforwardly apply when AI generates code that lacks the original's attribution. The AI isn't 'removing' attribution in the traditional sense. It's generating new output that doesn't include it.

The copyright infringement claims also fell. The court found the plaintiffs hadn't adequately alleged that specific copyrighted works were reproduced in specific Copilot outputs. AI code generation involves transformation and recombination, making it hard to draw a direct line from input to output without more specific allegations.

Two claims survived: open-source license violation and breach of contract. These claims rest on the argument that by using code governed by open-source licenses, GitHub and its partners agreed to the license terms, and Copilot's output violates those terms by failing to include required attribution and license text. The case continues on these narrower grounds.


The Ruling

Judge Tigar's ruling drew important lines. On the DMCA: stripping attribution from AI training data isn't the same as 'removing' copyright management information under the statute, at least not without more specific factual allegations. The DMCA was written for scenarios where someone deliberately removes a copyright notice, not where an AI system generates content that happens to lack one.

On copyright: the plaintiffs needed to identify specific copyrighted code and specific Copilot outputs that reproduced it. Broad allegations that 'Copilot trained on our code' weren't enough. This sets a high bar for copyright claims against AI code generators, requiring plaintiffs to show concrete instances of reproduction.

On the surviving claims: open-source licenses are contracts. If GitHub used code subject to GPL, MIT, or other licenses, and Copilot's outputs don't comply with those license terms, that's a breach. This theory doesn't require showing exact reproduction. It requires showing that the license terms weren't followed. That's a more viable path for the plaintiffs.

Outcome: Judge Tigar dismissed 20 of 22 claims, including the DMCA violation claim (the biggest blow to plaintiffs). The case proceeds on two remaining claims: open-source license violation and breach of contract.

Why This Case Matters

This case defines the legal risks for AI code generation, a market worth billions. GitHub Copilot, Amazon CodeWhisperer, and similar tools all face the same question: does training on open-source code and generating code without license compliance violate developers' rights? The surviving claims keep that question alive.

The DMCA dismissal was a major win for AI companies. If the court had allowed the DMCA claims to proceed, every AI tool that trains on attributed content and generates output without attribution would face statutory damages of up to $25,000 per violation. That risk is off the table for now, though it could return with better-pleaded allegations.

The open-source license claims matter beyond code. They test whether AI training triggers license obligations. If training an AI on GPL-licensed code means the AI's outputs must also be GPL-licensed, that has implications for every generative AI product trained on licensed content. The open-source community is watching this case closely.


Lessons for Attorneys

For attorneys representing developers: the copyright path is hard. You need to show specific code was reproduced in specific outputs. Start building evidence now. Use Copilot and document instances where it generates code matching your clients' repositories. Screenshots, timestamps, and side-by-side comparisons are essential.

For attorneys advising AI companies: the DMCA and copyright dismissals provide breathing room, but the contract and license claims are real. Audit your training data for license obligations. If your AI trains on GPL, MIT, or Apache-licensed code, understand what those licenses require and whether your product complies. The cost of licensing compliance is far less than the cost of litigation.

For IP attorneys generally: this case highlights the gap between AI capabilities and existing IP law. The DMCA, copyright statutes, and open-source licenses were all written before AI code generation existed. Courts are struggling to apply these frameworks to new technology. That gap creates both risk and opportunity. Firms that help clients navigate this uncertainty will capture significant work as the law develops.


The Bottom Line

Doe v. GitHub dismissed most claims against AI code generation tools but kept alive the question of whether open-source license terms apply to AI training and outputs. The surviving contract and license claims will determine whether AI coding assistants must comply with the licenses of the code they learned from.

AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.