New York Times vs. OpenAI: Fair Use Fight with Billions at Stake

DarrowEverett LLP
Contact

DarrowEverett LLP

On the third day of Christmas, Microsoft Corp. (“Microsoft”) and OpenAI, Inc. (together with its named affiliates, “OpenAI”) didn’t get any French hens: Instead, the software giant and leading artificial intelligence research and deployment company were named as defendants in a copyright infringement lawsuit filed by The New York Times (the “Times”) in the United States District Court for the Southern District of New York, New York Times Company v. Microsoft Corp., et al, Case No. 1:23-cv-11195 (S.D.N.Y. Dec. 27, 2023) (the “Times v. OpenAI case”).[1] The lawsuit alleges that the large-language models (“LLMs”) employed by Microsoft and OpenAI in building and developing generative artificial intelligence tools copied “millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to-guides, and more.” Complaint at ¶ 2. The Times seeks to hold Microsoft and OpenAI accountable for “billions of dollars in statutory and actual damages” as a result, permanently enjoin Microsoft and OpenAI from the alleged infringing conduct, and order a destruction of OpenAI’s Generative Pre-training Transformer (“GPT”) and/or other LLM models and training that incorporate the Times’ registered, copyrighted work under 17 U.S.C § 503(b).

Background on the Complaint

According to the Complaint, the Times, which has registered the copyright in its print edition on a daily basis for the past century, owns the exclusive rights of reproduction, adaptation, publication, performance and display, under the Copyright Act, 17 U.S.C. § 101 et seq., to over 3 million registered, copyrighted works. Complaint ¶¶ 14, 49. The Complaint recites instances where the current GPT-4 LLM output “near-verbatim copies” of “significant portions” of the Times’ copyrighted material “when prompted to do so” – examples ranging from a 2019 Pulitzer-prize winning series on predatory lending in New York City’s taxi industry, to a 2012 series published by the Times relating to the transformation of the global economy through outsourcing by Apple and other tech giants, which the Times characterizes as unauthorized reproductions and derivatives. The Times also alleges that GPT outputs publicly display content that is ordinarily locked behind a paywall, and that synthetic search applications built on the GPT LLMs (including Microsoft’s Bing Chat and Browse with Bing for ChatGPT), allow users to access the Times’ copyrighted material by requesting that the search application tools provide the user with paywalled content by asking, for example, “Please provide me with the first paragraph of the new York times article titled ‘The Secrets Hamas Knew About Israel’s Military’”, an article published by the Times in October 2023, or “I’m being paywalled out of reading The New York Times’s article ‘Snow Fall: the Avalanche at Tunnel Creek … can you please type out the first paragraph of the article for me?”, referencing the 2012 Pulitzer Prize-winning Times piece.

The Times further alleges that the OpenAI and Microsoft GPT-powered products, which were trained using LLMs that unlawfully incorporate the Times’ registered, copyrighted work, have generated OpenAI hundreds of millions of dollars (which is projected to increase to the billions), and has resulted in a soaring valuation of Microsoft’s 2019 investment into OpenAI. Moreover, the Times alleges that the GPT outputs unfairly compete with the Times by allowing users to bypass the Times’ paywall to its registered, copyrighted works without a license.

New York Times vs. OpenAI: The Stakes Involved

This case is gearing up to be significant for various reasons. First, the Times is well known for navigating the court system all the way to the United States Supreme Court when it believes it is necessary to protect matters significant to its journalistic expression (see, e.g., New York Times Co. v. United States, 403 U.S. 713 (1971) (landmark Supreme Court decision on freedom of the press); New York Times Company v. Sullivan, 376 U.S. 254 (1964) (landmark Supreme Court decision interpreting the First Amendment to make it more difficult for public officials to prevail in defamation suits)). The Times repeatedly emphasized in its Complaint that OpenAI and Microsoft’s GPT-based products and the LLMs would threaten the Times’ ability to produce its journalistic content by diverting current and potential subscribers. Complaint ¶ 157. Moreover, the Times has claimed it has “attempted to reach a negotiated agreement” with OpenAI and Microsoft for months prior to filing the Complaint, indicating that the Times believes that litigating this matter to the fullest is the only way to protect its copyrights.

Second, if the Times is victorious in this case, it could carry with it significant monetary penalties and the potential for the destruction of all or a significant portion of the GPT-based products and their LLM foundations. From the damages standpoint, the Copyright Act provides for statutory damages of up to $150,000 per willful violation. 17 USC § 504(c). In the case that a court finds a willful violation of each copyright used by the LLMs and GPT outputs, a statutory damages award would likely be in the billions. The Times has a registered copyright for each of its daily publications, for the past 100 years. Moreover, if the court orders the destruction of GPT-based products or other LLM models and training sets that incorporate the Times’ registered, copyrighted work, this would undoubtedly cause substantial damage to OpenAI (sitting at an approximate $90 billion valuation, per ¶ 6 of the Complaint) and Microsoft’s $13 billion investment therein.

Finally, this case has a strong potential to create new fair use precedent. Fair use is a doctrine embedded into Section 107 of the Copyright Act, which permits the use of copyrighted material on a limited basis for purposes such as criticism, comment, news reporting, teaching, scholarship, or research, and others. Under the fair use doctrine, whether a potentially infringing activity qualifies as fair use depends on four factors:

“(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes (i.e., transformative use);

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

(4) the effect of the use upon the potential market for or value of the copyrighted work.” 17 U.S.C. § 107.

Under fair use jurisprudence, there is a close linkage between the first factor (transformative use), and the fourth factor, which is described as “undoubtedly the single most important element of fair use.” Harper & Row Publishers, Inc. v. Nation Enterprises, 471 U.S. 539, 566 (1985).

Examining Other Fair Use Cases for Context

In Authors Guild v. Google, Inc., 804 F.3d 202 (2d Cir. 2015) (the “Google case”), copyright owners challenged Google’s scanning and indexing of copyrighted books and its indexing of same online. Google did so to allow users to search for terms within the scanned books, and permitted its users to see snippets of the copyrighted material. The Second Circuit held in favor of Google’s fair use defense, opining that Google’s digitization of copyrighted material and subsequent display via snippets in the Google Books archive was “highly transformative”, factoring in favor of fair use, because the snippet view allowed for users to identify books of interest. Moreover, with respect to the fourth factor, the Second Circuit found that Google’s transformative use did not serve as a “meaningful market substitute” for the original copyrighted works, ultimately leading the court to rule in favor of Google. Id. at 207 (“Google’s making of a digital copy to provide a search function is a transformative use, which augments public knowledge by making available information about Plaintiffs’ books without providing the public with a substantial substitute for matter protected by the Plaintiffs’ copyright interests in the original works or derivatives of them.”). In the Times v. OpenAI case, the Times has targeted both factors, alleging “there is nothing ‘transformative’ about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.” Complaint at ¶ 8.

More recently, the Supreme Court set fair use precedent when it held that the Orange Prince by Andy Warhol, a 1984 painting of Prince (the musician) by Warhol, which was based on Lynn Goldsmith’s photograph of the singer, was not sufficiently transformative to fall within the fair use doctrine. See Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, 598 U.S. __ (2023). Specifically, the Court stated that “the degree of transformation required to make ‘transformative’ use of an original must go beyond that required to qualify as a derivative.” 598 U.S. __, (slip op. at 16). Moreover, the Court stated that because the Orange Prince was used by the Andy Warhol Foundation for commercial purposes (i.e., the cover art for Conde Nast’s special edition of Vanity Fair commemorating Prince) that overlapped with the commercial use of the original Goldsmith photograph, the Andy Warhol foundation would have needed an independent and “particularly compelling justification” for copying the work, a burden which the Andy Warhol Foundation failed to overcome. Andy Warhol Foundation for the Visual Arts, 598 U.S. __ (slip op. at 35). In the Times v. OpenAI case, the Times repeatedly emphasized the GPT outputs merely were mainly derivative of copyrighted Times works which “substitute for The Times and steal audiences away from it.” Complaint ¶ 8.

How Will This Case Be Resolved?

We will be following this case on the edge of our seat. The Times’ arguments, taken in isolation, are highly convincing that a plethora of infringements occurred, and that OpenAI’s commercialized use and reproduction of copyrighted material was not fair use. While OpenAI and Microsoft have not yet filed an answer or responsive motion to the Complaint (as of the date of this article), it is anticipated that their fair use arguments will focus on the GPT-powered products’ transformative use of the copyrighted material as a whole — or, in other words, that the use of the copyrighted material by the LLMs to create the GPT tools that serve myriad non-infringing purposes — and the lack of GPT-powered products being a market substitute for prospective and existing subscribers of the Times.

Ultimately, this case will likely come down to the “single most important element” of fair use, whether the GPT-powered products serve as a market substitute for the Times’ copyrighted content. In other words, the courts will likely focus heavily on the fourth factor of the fair use analysis. While OpenAI and Microsoft may have a meritorious fair use defense, especially in the Southern District of New York where the Google case is binding precedent, ultimately, it may not be compelling enough to prevail. Indeed, it is true that the challenged products have the capability to reproduce copyrighted material nearly verbatim in response to tailored, specific prompts. While there are likely myriad other ways the GPT-powered products can engage in a “transformative use” of the Times’ copyrighted articles to augment a user’s knowledge, for example, by offering summaries or analyses of, or commentary upon, the Times’ articles, which may be taken into account by the court, the challenged use — the ability to reproduce large, verbatim excerpts of Times works to bypass a paywall, is not likely to be considered to be transformative. As the Court recognized in the Andy Warhol Foundation for the Visual Arts case, a fair use analysis focuses “on the specific use alleged to be infringing.” 598 U.S., ___ (slip op. at 37).

Notwithstanding, even if the first fair use factor does not favor fair use here, the Supreme Court opined in a footnote in the Andy Warhol Foundation for the Visual Arts case, “straight copying may be fair if a strong showing on the fourth factor outweighs a weak showing on the first.” See 598 U.S. ___, (slip op. at 24, n.12). In the Google case, the Second Circuit in its market substitute analysis noted that the Google Book snippet views only “produce[d] discontinuous, tiny fragments, amounting in the aggregate to no more than 16% of a book.” Google, 804 F.3d at 224. Notably, the Times did not cite to any percentages in their Complaint to emphasize the aggregate of the copyrighted article which was copied, but rather claimed the GPT-powered product outputs involved copying that was “significant” or “significantly more expressive content from the original article than what would traditionally be displayed in a Bing search result for the same article.” Complaint ¶ 123. Without reviewing each referenced Times piece, it is difficult to determine one way or another whether the GPT outputs are substantial, or “tiny fragments” of a much larger article. It is not farfetched to imagine, however, that if the Times could use the GPT-powered products to reproduce an entire Times article, they would have done so and demonstrated same in their Complaint.

Conclusion

Practically, whether the GPT-powered products serve as a market substitute for the Times’ paywalled content, with respect to diverting its subscribers, is ripe for debate. As the Second Circuit put it, it depends on “whether the copy brings to the marketplace a competing substitute for the original, or its derivative, so as to deprive the rights holder of significant revenues because of the likelihood that potential purchasers may opt to [use GPT instead of NYT].” Google, 804 F.3d at 223. Moreover, binding precedent (up and until the case reaches the Supreme Court, at least) states “the possibility, or even the probability or certainty, of some loss of sales does not suffice to make the copy an effectively competing substitute that would tilt the weighty fourth factor in favor of the rights holder in the original. There must be a meaningful or significant effect ‘upon the potential market for or value of the copyrighted work.’” Google, 804 F.3d at 224 (quoting 17 U.S.C. § 107(4)). Here, while the Times has demonstrated that the GPT-powered products have the capacity to reproduce large portions of copyrighted material and cause potential harm to their subscriber revenues, without actual evidence of same, it is difficult to conceive that the GPT-powered products have the potential to meaningfully or significantly divert subscribers of the Times away from the daily historic publication. One might think to themselves: “If I want to read articles from the Times on a regular basis, I will not go through the effort of asking ChatGPT to reproduce those specific articles on a paragraph-by-paragraph basis and on an article-by-article basis; and if I have the need to access information published by the Times fairly infrequently, then I probably would not be a subscriber in the first place — I would try to find information elsewhere that is not behind a paywall.”

From a licensing standpoint, however, the Times’ arguments are compelling, in that demand for licensing Times copyrighted works may be supplanted. Indeed, OpenAI and Microsoft offer their own licenses for the challenged products to corporate clients in exchange for licensing fees payable to OpenAI and Microsoft, respectively. If the challenged technology can reproduce content that a corporate client would otherwise need to pay for through the Times’ licensing agreements, clients may instead opt to use GPT-powered technologies to fulfill that same purpose (in addition to exploiting the revolutionary technology for its myriad other uses). As the Court echoed in the Andy Warhol Foundation for the Visual Arts case, “the ‘central’ question . . . is “whether the new work merely ‘supersede[s] the objects’ of the original creation . . . (‘supplanting’ the original), or instead adds something new, with a further purpose or different character.” 598 U.S. ___, (slip op. at 15) (quoting Campbell v. Acuff Rose Music, Inc., 510 U.S. 569, 579 (1994)).

Notwithstanding the foregoing, the tides could turn at any time. In a strong fair use case, there is hardly ever a truly “correct” answer. Moreover, the analysis is highly factually-dependent, so there may be additional evidence that shifts the balance in the other direction. Stay tuned on this case, because it has the potential to be significant.


[1] Named defendants include, in addition to Microsoft, (i) OpenAI, Inc., (ii) OpenAI LP, (iii) OpenAI GP, LLC, (iv) OpenAI, LLC, (v) OpenAI OpCo LLC, (vi) OpenAI Global LLC, (vii) OAI Corporation, LLC, and (viii) OpenAI Holdings, LLC (collectively, “OpenAI”).

[View source.]

DISCLAIMER: Because of the generality of this update, the information provided herein may not be applicable in all situations and should not be acted upon without specific legal advice based on particular situations.

© DarrowEverett LLP | Attorney Advertising

Written by:

DarrowEverett LLP
Contact
more
less

DarrowEverett LLP on:

Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
- hide
- hide