This summer, artificial intelligence (“AI”) developers notched their first major fair use victory when U.S. District Judge William Alsup issued a split ruling on whether AI companies like Anthropic may legally train large language models (LLMs) on copyrighted books without permission from authors, publishers or copyright owners.
The Court held that Anthropic’s use of copyrighted books for training, and its digitization of lawfully purchased print copies, qualified as fair use because the process was “exceedingly transformative,” similar to how humans read, absorb, and draw inspiration from literature.
Notwithstanding, Anthropic’s actions, which they allege were part of a broader objective to build a world-class large language model, were not without condemnation by the Court. Judge Alsup ruled that Anthropic’s use of more than 7 million pirated books to maintain a permanent internal library was not fair use, rejecting the argument that AI companies can “take all the books in the world” forever and without compensation.
The parties recently agreed to resolve these claims through a settlement whereby Anthropic would pay $1.5 billion into a settlement fund, representing one of the largest copyright recoveries in U.S. history and covering approximately 500,000 books. Eligible works must have a publishing identification number and have been registered with the U.S. Copyright Office before Anthropic’s download, or within three months of first publication, with registration completed within five years of publication to benefit from legal presumptions. Anthropic is also required to destroy all of the pirated copies in its possession.
Judge Alsup expressed concerns that the settlement lacked clarity about which authors and works were covered, how notice and claims would be administered, and whether the deal truly served authors. He warned that vague class definitions could lead to future litigation.
The Anthropic case illustrates the ongoing tension between technological innovation and existing intellectual property rights. While the ruling confirms that AI training may qualify as transformative fair use when copyrighted materials are obtained legally, it also underscores the limits of that defense when unauthorized works are involved. This landmark decision sets an important precedent for how AI developers must balance innovation and creators’ intellectual property rights.
SETTLEMENT UPDATE:
Judge Alsup has issued a preliminary sign off on Anthropic’s $1.5 billion settlement in the formative copyright case brought by authors over the company’s use of pirated books to train its AI models. Although the court initially flagged concerns about the settlement’s clarity and fairness to the aggrieved authors, the preliminary approval clears the way for Anthropic to resolve claims tied to its downloading of nearly 500,000 books from pirated online libraries. The agreement requires Anthropic to destroy all original and copied files and to compensate authors at about $3,000 per work. The deal came after Anthropic acknowledged the staggering risk of facing up to $1 trillion in statutory damages at trial. The court’s decision will undoubtedly set the tone for how future copyright disputes are handled in this AI era, underscoring that companies racing to build effective and competitive tools cannot sidestep legal and ethical obligations.
Disclaimer: This article was originally published on September 15 and was updated on September 26 to reflect the recent settlement.