AI Woes: Learning Without Permission

Caldwell
Contact

Caldwell

Companies that are in the business of providing generative artificial intelligence (“AI”) services, such as Meta Platforms Inc. (“Meta”) and OpenAI Inc. (“OpenAI”), are facing lawsuits brought by prominent authors in a slew of cases claiming infringement of their copyrights. The authors claim that these AI companies are using copyrighted materials to create the training datasets that these AI companies trained their language learning models on without having sought permission or otherwise having a right to do so. A class action suit brought by Michael Chabon and others against Meta in the U.S. District Court for the Northern District of California is the latest of these cases alleging copyright infringement.[1]

This and other cases raise several interesting questions. What is the basis for the infringement claims brought against AI companies? What defenses could companies like Meta and OpenAI rely on against these claims? Are there any best practices that may emerge from these cases?

Copyright Infringement

In the most recent suit against AI companies, brought by Chabon and other prominent authors, the authors allege that Meta trained its language learning model on a vast number of books, copied in their entirety, without having obtained permission and also by using so-called “shadow libraries” which provide access to illegally copied and uploaded books.[2] The authors brought direct and vicarious copyright infringement claims based on the training of the models and the use of unauthorized copies to create infringing derivative works when a model is used and generates outputs.[3]

Defense: Fair Use

Whether or not an unauthorized use qualifies as fair use depends on four factors, namely:

  • The character and purpose of the use (transformative and commercial vs. non-commercial),
  • the nature of the copyrighted work (creative vs. factual),
  • the amount and substantiality of the copyrighted work that is used, and
  • the effect of the use on the value of and market for the copyrighted work.

Without an in-depth analysis of each factor, there are two particularly interesting things to consider here. Firstly, it is likely that many AI companies will attempt to argue that their use, although often commercial in nature, is transformative under the first factor. In a submission to the USPTO by OpenAI, the company stated:

Works in training corpora were meant primarily for human consumption for their standalone entertainment value. The “object of the original creation,” in other words, is direct human consumption of the author’s expression. Intermediate copying of works in training AI systems is, by contrast, “non-expressive”: the copying helps computer programs learn the patterns inherent in human-generated media. The aim of this process—creation of a useful generative AI system—is quite different than the original object of human consumption. The output is different too: nobody looking to read a specific webpage contained in the corpus used to train an AI system can do so by studying the AI system or its outputs. The new purpose and expression are thus both highly transformative.[4]

It should be remembered that transformativeness has often been held to require something more than reproduction for consumption, for example, communicating information about the underlying work. Further, although the commercial nature of a use is not, on its own, necessarily going to prevent reliance on the fair use doctrine, it is “to be weighed against the degree to which the use has a further purpose or different character.”[5] Consequently, the Supreme Court held in Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith that where the copyrighted work and the unauthorized use share the same or highly similar purposes, and the unauthorized use is commercial, the first factor is likely to weigh against fair use, without some other justification for copying. This could weigh in favor of the AI companies, where courts hold that the unauthorized use has a distinct purpose from that of the original, copyrighted work.

Second, in its submission to the USPTO, Open AI argued that, under the third factor, it does not matter what amount of the copyrighted work was copied but what amount was made available to the public.[6] Open AI admits that it needs to use works in their entirety to create accurate AI but that this factor should not weigh against it when faced with infringement claims, as they are not making the training data available to the public. Such an understanding could, arguably, substantially narrow the scope of the reproduction right by requiring the distribution of any reproductions to find that the reproduction right was infringed. The third factor is concerned with whether the reproduction serves as a substitute for the copyrighted work. For this reason, it was held in Author’s Guild v. Google, Inc., a case where Google was sued for copyright infringement for digitizing millions of books to create search and snippet functions, that the third factor weighed in Google’s favor even though Google made unauthorized copies of entire works, as that copy was not revealed to the public.[7]

Best Practices

Generally, if you want to use someone else’s materials to train AI, you should refrain from doing so if you do not have express permission and seek qualified advice. There are a few principles that can guide you when determining what material to train a generative AI model on.

Copyright protection is limited in duration. Although the length and scope of protection vary from country to country, it does lapse after a certain amount of time. In the U.S., for example, protection for most works created after January 1, 1978, lasts for the author’s life plus 70 years. The copyright term for works published before 1978 varies depending on several factors. However, the work falls into the public domain after the copyright lapses. Works in the public domain can be used to, for example, train a machine learning model.

Similar to software that can be used under an open-source license, certain copyrighted works may be used under a license, such as the various Creative Commons (“CC”) licenses.[8] The scope of permitted uses varies and should be reviewed closely before using the work. However, this can provide access to more recently created copyrighted works versus those in the public domain due to copyright lapse.

Works currently protected by copyright and not offered for use and access to the public may be used with the copyright holder’s express permission. Further, it is possible to use a copyright-protected work without the copyright holder’s permission, for example, if the fair use doctrine applies. This is a highly fact-dependent determination that must be made on a case-by-case basis, as illustrated above.


[1] Class Action Complaint, Michael Chabon et al. v. Meta Platforms Inc., 4:23-cv-04663, (N.D. Cal., Sep. 12, 2023).

[2] Id. at paras. 22 – 39.

[3] Id. at paras. 52 – 64.

[4] OpenAI, LP at p. 5, Comment Regarding Request for Comments on Intellectual Property Protection for Artificial Intelligence Innovation, Before the United States Patent and Trademark Office Department of Commerce, Docket No. PTO–C–2019–0038, https://www.uspto.gov/sites/default/files/documents/OpenAI_RFC-84-FR-58141.pdf (last visited. Nov. 1, 2023).

[5] Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508, 143 S. Ct. 1258, 1276, 215 L. Ed. 2d 473 (2023).

[6] Id. at 6-7.

[7] Authors Guild v. Google, Inc., 804 F.3d 202, 221-222 (2d Cir. 2015).

[8] Creative Commons, About CC Licenses, creative commons, https://creativecommons.org/share-your-work/cclicenses (last visited Nov. 1, 2023).

DISCLAIMER: Because of the generality of this update, the information provided herein may not be applicable in all situations and should not be acted upon without specific legal advice based on particular situations.

© Caldwell | Attorney Advertising

Written by:

Caldwell
Contact
more
less

Caldwell on:

Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
- hide
- hide