Background
Artificial intelligence models, particularly large language models (LLMs), are trained on vast quantities of textual content, much of which is protected by copyright. This has raised critical legal questions about whether and when the use of such material for AI training requires permission from the copyright owner under U.S. copyright law. Two recent rulings in the U.S. District Court for the Northern District of California – Bartz v. Anthropic and Kadrey v. Meta – offer the first significant judicial guidance on how courts will look to apply the fair use doctrine to AI training practices.
Fair use is an affirmative defense to copyright infringement under U.S. copyright law (17 U.S.C. § 107) that permits limited use of copyrighted material without permission from the rights holder, typically for purposes such as criticism, comment, news reporting, teaching, scholarship, or research. Courts evaluate the applicability of the fair use defense by balancing four factors: the purpose and character of the use (including whether it is commercial and whether it is “transformative” – i.e., adds new expression or meaning), the nature of the original work, the amount and substantiality of the portion used, and the effect of the use on the market for the original work. No single factor is determinative, though courts often treat market harm as the most important.
Bartz v. Anthropic
In a recent ruling (June 24, 2025), a California court delivered a landmark judgment in a copyright dispute between several literary authors and Anthropic, an AI company. The case centered on Anthropic’s practice of acquiring millions of books, some by purchasing, scanning, and destroying print editions, others by downloading pirated digital copies from unauthorized online sources, to create a central digital library and to train its LLMs. The court’s analysis under the U.S. fair use doctrine (Section 107 of the Copyright Act) was nuanced and highly instructive for both AI developers and rights holders.
Key rulings
- Training AI models as fair use: The court found that using lawfully acquired copies of books to train AI models is a “spectacularly transformative” fair use. The purpose – learning statistical relationships between text fragments to enable generative outputs – was distinct from the expressive, human-readable purpose of the original works.
- Format-shifting of lawfully acquired works: The court also addressed Anthropic’s practice of purchasing print books, destroying the originals, and scanning them to create digital copies for internal use. This “format-shifting” was found to be a fair use, as it did not result in additional copies being distributed or made available to the public, but simply replaced a physical copy with a digital one for the same internal purpose.
- Pirated copies and central libraries: In sharp contrast, the court held that creating a permanent library of pirated books – even if not all were used in training – was not a fair use. The retention alone posed harm to the market and the court left open the possibility of damages for past infringement involving pirated works, including statutory damages for willful infringement.
- Market harm: The court squarely rejected the rights holders’ argument that it should evaluate the potential for the trained Anthropic model to generate works that compete with or displace the market for the originals when assessing whether the training-phase copying qualified as fair use. Judge Alsup dismissed this theory as speculative and likened it to complaining that teaching schoolchildren to write might result in more competing works – a comparison he deemed inapt for purposes of copyright protection. He held that this kind of “competitive or creative displacement” falls outside the type of market harm contemplated by the Copyright Act. Instead, the court focused its fair use analysis on whether the copies made during training themselves competed with or substituted the market for the original works and found they did not.
Commercial implications
Even if the end use is transformative, acquisition matters. Companies that train on unlawfully sourced content – or even store it – face meaningful liability exposure. Clauses in licensing agreements and terms of use that expressly prohibit AI training may strengthen claims of infringement by making use of content for training purposes clearly illicit and should be strongly considered by copyright owners.
Kadrey v. Meta
Shortly after the Anthropic ruling, the U.S. District Court for the Northern District of California considered a similar dispute, this time involving Meta Platforms, Inc. (the parent company of Facebook and Instagram). In this case, 13 literary authors sued Meta for downloading their books from “shadow libraries” (being unauthorized online repositories) and using them to train its LLM, known as Llama.
Key rulings
- Fair use affirmed – procedurally, not substantively: The court emphasized that while generative AI may present serious risks to creative markets, the plaintiffs in this case failed to prove Meta’s models reproduced their works or caused actual direct or indirect market harm. Without that showing, their infringement claim could not prevail.
- Not a precedent of legality: The Meta court did not reach a legal conclusion on whether training on unlawfully obtained content is permissible. Rather, the decision turned on the plaintiffs’ failure to develop a factual record showing infringement or substitution. Future plaintiffs who do so could prevail under similar facts, but only if a negative impact on the market for the copyrighted works is clearly and specifically demonstrated.
- Transformative use weighed against market effect: As in Anthropic, the court found the use of copyrighted materials in the training phase to be transformative. However, the court here was less concerned with whether the source material was obtained legally in finding there was fair use. The decision emphasized that fair use is a fact-specific, holistic inquiry, with the most important factor being the effect on the market for the original work.
- Market harm: In contrast to the Anthropic ruling, the court in Meta was far more persuaded by the argument that Meta’s use of copyrighted works, specifically for training its LLM, could significantly harm the market for the original copyrighted works, noting that the harm of “market dilution” in a marketplace flooded by AI-generated works could successfully crowd out many authors. The court noted that this indirect market impact theory may be more persuasive in some cases than others (e.g., well-established authors are less likely to be crowded out by AI market dilution than up-and-coming authors, and authors of fiction works are less likely to be displaced than authors of non-fiction works, given that these works are more “functional” and less “creative”). Ultimately, however, the court noted that the rights holders failed to provide almost any evidence supporting this theory of market harm in this particular case, but left the door open for future arguments along this line.
Commercial implications
The Meta decision underscores the vital importance of proving market harm, not just alleging it. For now, AI developers may prevail without licenses unless plaintiffs can build a detailed record of substitution or revenue loss. Copyright owners would be well served by proactively developing evidence of market impact, which may include tracking where AI-generated content is used instead of copyrighted works in revenue-generating contexts (e.g., sync placements, streaming platforms), documenting licensing negotiations or refusals, and establishing the existence of a licensing market for LLM training.
Comparison and key takeaways
Both cases recognize that training AI models is a highly transformative use, distinct from traditional copying or distribution, and neither grants rights holders an automatic entitlement to licensing fees for AI training or the right to restrict use of their works for this purpose, especially where the use does not directly substitute for the original work. However, the Anthropic decision draws a firmer line against the use of pirated material, exposing AI developers to liability for unauthorized copying regardless of subsequent transformative use, while the Meta decision places a greater burden on copyright owners to establish that the LLM training has a negative impact on the market for the copyrighted works.
The Anthropic and Meta decisions also took a markedly different stance regarding the availability of an indirect market harm argument stemming from use of copyrighted works in the LLM training itself. While the Anthropic court quickly disposed of this argument – refusing to link the copies made during the training of AI models to any form of market substitution or displacement resulting from the output capabilities of the newly trained generative AI tool – the Meta court was more receptive to the theory of “indirect substitution,” suggesting that the potentially dilutive effects of AI-generated content on the marketplace could be precisely the kind of market harm that U.S. copyright law aims to prevent and that may weigh against fair use when analyzing the critical market impact factor.
What these cases don’t decide
- No established right to licensing fees: Both courts accepted that fair use was a legitimate defense to the unauthorized use of copyrighted materials in the AI training process, and thus rejected the notion that copyright owners are inherently entitled to licensing markets for AI training absent proving actual market harm.
- No precedent for legality of pirated training data: The ruling in Meta is based on an insufficient showing under the market harm analysis, and should not be interpreted as giving a green light to AI developers to use pirated content for AI training.
- No consensus on market harm caused by AI training: The courts in Anthropic and Meta took vastly different approaches to evaluating the various kinds of market harm that may result from training AI models based on copyrighted content. This divergence in approach highlights the need for additional higher court rulings on point and exposes the critical lack of consensus on how the market harm element may be evaluated in future decisions.
- Training vs. outputs – a liability gap: Even if model training qualifies as fair use, developers may still face liability if the model outputs infringing content – especially music (lyrics, melodies, vocal likenesses). These cases did not address output-based infringement, leaving a major risk area unsettled.
Authority of rulings
While these cases mark a substantial development in the fair use approach to AI training, these two district court-level decisions are not binding on other district courts, in other circuits, or on the Ninth Circuit Court of Appeals. Though these cases may be persuasive, another judge faced with similar facts could reach a different conclusion, and until a higher authority weighs in, the door remains open for other courts to develop an approach to these issues that differs from the analysis presented in these cases. If either case is appealed, a Ninth Circuit opinion would become binding precedent on all federal courts in Alaska, Arizona, California, Hawaii, Idaho, Montana, Nevada, Oregon, and Washington. Should the issue reach the Supreme Court, a decision there would set nationwide precedent.
In-depth 2025-176