The U.S. Copyright Office recently issued a policy statement clarifying its approach to examining registrations of compilations, including claims for the selection and arrangement of independently uncopyrightable subject matter. Going forward, the Copyright Office will no longer register compilations of uncopyrightable subject matter—for example, “a compilation of yoga poses” or a “compilation of rocks.” While this change may seem relatively uncontroversial, the policy statement may have broader implications for the treatment of other types of uncopyrightable subject matter. This blog post addresses the policy’s potential implications for the protection of large datasets: put simply, structuring a data provision as a license may no longer suffice to protect against unwanted use.

Feist’s Phonebook

Relying on the Supreme Court’s decision in Feist Publications, Inc. v. Rural Tel. Serv. Co., as well as the text and legislative history of the Copyright Act, the Copyright Office determined that a selection and arrangement of uncopyrightable elements, unless resulting in a work that is itself copyrightable under 17 U.S.C. § 102, is not copyrightable and therefore not registrable. In Feist, the Supreme Court held that Rural’s alphabetically-arranged phonebook lacked the modicum of originality necessary for copyright protection. Because it took no creative effort on Rural’s part to collect and arrange the facts presented in the phonebook—names, addresses, phone numbers—the Court held that to allow Rural a remedy for copyright infringement against another phonebook provider would be to allow Rural to impermissibly lock down uncopyrightable facts. Taking Feist’s cue, the Copyright Office now states that it will not register a work that claims a “compilation of ideas,” a “selection and arrangement of handtools,” or a “compilation of rocks.”

It has always been the case, though, that a copyrightable work must be fixed in a tangible medium of expression. Thus, a compilation of rocks wouldn’t be copyrightable in any case unless perhaps in the guise of pictures of rocks, which pictures would themselves be copyrightable and thus registrable. Even poetry isn’t copyrightable until put to paper (or sound recording).

How Does Yoga Fit In?

The policy statement arose out of a 2005 case out of federal court in the Northern District of California, Open Source Yoga Unity v. Choudhury, in which the district court held that there were triable issues of fact whether specific yoga asanas were arranged in a sufficiently creative way so as to merit copyright protection. In the Copyright Office’s view, Bikram Choudhury, plaintiff and progenitor of “Bikram Yoga,” attempted to use a compilation copyright in a series of yoga poses to keep individuals and businesses, including Open Source Yoga Unity, from utilizing the uncopyrightable “facts”—the yoga asanas—underlying his copyright, whether through public performance, public display, or otherwise. The Copyright Office took issue with Choudhury’s claims, stating: “copyright will not extend to the movements themselves, either individually or in combination, but only to the expressive description, depiction, or illustration of the routine that falls within a § 102(a) category of authorship.”

Data vs. Dance Steps

Based on the Copyright Office’s conclusion that a compilation must itself be separately copyrightable under § 102(a), the policy statement analyzes yoga poses (and other exercises) in the context of a choreography claim. And it is this context in which the Copyright Office jumps from a fairly standard interpretation of Feist to something new and broader. The policy statement provides that a compilation of “simple routines, social dances, or even exercises” is not registrable unless resulting in a choreographic work. The policy statement further provides that:

A claim in a choreographic work must contain at least a minimum amount of original choreographic authorship. Choreographic authorship is considered, for copyright purposes, to be the composition and arrangement of a related series of dance movements and patterns organized into an integrated, coherent, and expressive whole.

While the Copyright Office’s policy statement clearly and immediately affects the copyrightability and registrability of yoga poses and other exercises, it resonates far afield from the arena of deadlifts and yoga asanas as well.

First, the statement seems to move the Copyright Office closer to becoming an arbiter of expression. Typically, courts tend to steer clear of similar conclusions for fear that they’ll get into the business of distinguishing “art” from “not art,” which almost invariably results in distinguishing “good art” from “bad art.”  And shifting forward the lower bound of the modicum of creativity requirement for choreographic works may have unintended rippling effects on the copyrightability (and therefore, registrability) of marginal literary (computer software) and photographic works (product shots).

Second, it’s difficult on the Copyright Office’s proposed basis to distinguish a datum from a social dance step. At what point does the modicum of creativity idea kick in to make a compilation of either copyrightable? For choreography, would the insertion of random, unconnected movements (for example, a piece in which two “social dance steps” are bookended by spastic body movements) suffice? How can you select and arrange data in a copyrightable way? At the very least, the policy statement raises many questions that it fails to answer.

Copyrightability and Big Data

While rocks, handtools, and yoga poses may seem remote from the world of big data and computer-based analytics, they are actually quite closely intertwined. A single datum is very much an uncopyrightable fact. That is, the amount of friends you have on Facebook or followers you have on Twitter, whether you’ve clicked any Sponsored Links on Google in the last three days (and which links you clicked)—these are all facts existing in the world independent of their expression in a spreadsheet somewhere. Whether compilations of similar data are copyrightable has long been a subject of furious debate in legal circles, both before and after Feist. This debate resulted in passage of the Database Directive in the European Union, which extended copyright protection to databases. However, no similar laws have been passed in the United States. As a result, copyright protection for databases has been more tenuous here over the years.

The implication of the Copyright Office’s new policy is to renew doubt in the protectability of these datasets. Although a dataset may in its entirety be a copyrightable compilation—if a court concludes that it, for some reason, possesses the requisite originality to be considered a literary work—this copyright will not protect against the user who poaches a single datum (or more) at a time. And the Copyright Office’s policy arguably evinces its intent to eradicate even the possibility of a “thin” copyright in selection and arrangement. As a result, big data could end up the 21st century version of Feist’s phonebook.

In recent years, companies have begun to recognize the value of enormous datasets. For example, if a company can accurately gauge the buying preferences of millions of 18- to 25-year-olds based on data gathered from various social media, it can more easily market to that group and drastically increase its profits. Recognizing this, companies specializing in data management and analytics have stepped in to fill the need for accurate and efficient manipulation and analysis of large datasets.

Traditionally, data providers and aggregators have asserted ownership in their datasets and, on this basis, structured their agreements as licenses—for example, a typical data license might read: “I hereby grant you the right to access and use any data provided under this agreement.” In a world in which datasets are not separately copyrightable though, such a license may be superfluous except to the extent to which a court might read it as a contractually-created negative covenant, requiring the “licensee” to refrain from taking actions in derogation of the rights explicitly set out in the agreement. While it may seem obvious, a person’s simply stating that he or she owns something doesn’t actually create an ownership interest in that thing. As a result, parties’ rights in data under many of these agreements may be ambiguous at best.

If the Copyright Office’s interpretation takes hold in the courts, we may (and probably should) see many more data companies moving toward pure contractual arrangements in which no affirmative rights in data are granted, but rather broad restrictions are placed on the use and disclosure of data categorized as the discloser’s confidential information. As the copyright system is no longer—if it ever was—a strong means of protecting valuable proprietary data, companies are likely to find themselves seeking protection under the umbrella of trade secret.

This is problematic for agreement drafters from the standpoint of legal certainty, as jurisdictions differ significantly as to the scope and application of trade secret protection, while the Copyright Act applies uniformly nationwide. Although all states but Texas, New York, North Carolina, and Massachusetts have adopted the Uniform Trade Secrets Act (“UTSA”) in some form, state courts often differ substantially in their interpretation of UTSA. And many states have made amendments to the act as implemented, thereby further differentiating available protections.

Vigilance will be required on both sides of a negotiation to ensure that confidentiality and choice of law provisions are drafted very carefully. Boilerplate confidentiality provisions may be insufficient to protect datasets from unwanted use, and where a company’s value is in its data, that data’s protection can be the difference between failure and flourishing.