Reviewing Copyright & Generative AI: The Good, the Bad, and the Unanswered

Category: ArticlesIntellectual Property & Technology Tags: AICopyrightIntellectual Property

Reviewing Copyright & Generative AI Article

July 22, 2024

The term “artificial intelligence” is no longer confined to the pages of science fiction or the runtime of Blade Runner. But don’t let that fool you. AI has been a part of our lives longer than you may realize. From email automations to voice command search tools like Alexa or Siri, AI has become a prevalent and useful facet of technology. Over the past five years however, the emergence of generative AI models like ChatGPT, Midjourney, and DALLE has begged new ethical and legal questions, and the intellectual property considerations of both AI programs and their outputs are reaching a boiling point.

Copyright claims related to AI are complex, and the law has not yet caught up to these advancements. Attorneys, legislators, and scholars alike are already contemplating the privacy, intellectual property, free speech, equal employment, and national security implications of artificial intelligence. Among these considerations, copyright law has been a major point of contention.

What is Generative AI?

Generally speaking, artificial intelligence is software, firmware, and/or hardware that is trained to solve complex problems or complete tasks that require the instrument to mimic human thought or action. Make no mistake, AI cannot think, nor is it sentient. The philosophical implications of this premise are encapsulated in the currently unrealized idea of Artificial General Intelligence (AGI), but no truly intelligent autonomy exists in today’s models.

In practice, AI cannot operate independently or of its own mind (it doesn’t have one). It is not a stochastic process. AI cannot operate randomly because it is bound to the instructions, guardrails, and training parameters of its developer. Further, “artificial intelligence” is a term used to encapsulate a vast blanket of probabilistic models ranging from robotics to voice assistant products (Siri and Alexa), website helpdesk chatbots, and reactive machine learning models. Here, we’ll focus on machine learning models like Midjourney, ChatGPT, GPT-4 (OpenAI), DALLE-2 (OpenAI), and Bard (Google).

There are two primary categories of commonly used, public-facing “Generative AI.” First, there are large language models (LLMs). These models, like Bard or ChatGPT, are supervised or semi-supervised chatbots that generate predictive, textual responses to prompts from a user. The second category includes large visual/vision models (LVMs), which operate similarly to LLMs but train on and generate image outputs instead (see Midjourney and DALLE-2). These models are trained on vast data sets and use prompts to generate responses through predictions based on past data ingestion. Many are semi-supervised, meaning the AI is initially tuned on a pre-defined data set but then continues to train itself based on user prompts and reactions to develop more accurate generation.

Functionally, the user-facing process looks like this: a user inputs prompt [X value] and the model generates [Y output] based on analytic weight and similarity of the prompt to previously ingested data. Here is an example from ChatGPT:

User Prompt: Explain briefly how generative, machine-learning, text-based AI models work as though you were explaining it to a fifth grader.

ChatGPT 3.5: Sure! Imagine a computer that loves to read books. It reads so many books and learns how words and sentences work. Then, when you ask it a question, it uses what it learned to create new sentences and give you an answer. It’s like having a really smart friend who learns from books and can talk to you!

The Legal Landscape of Copyright & AI

Copyright applies to artificial intelligence in two categories: ingestion and output. Ingestion refers to the process of tuning, where the developer uses pre-defined data sets to train the AI so that the model can accurately react to prompts and generate outcomes based on similarities to its training data. Outputs refer to the actual generations of the model. In LLMs, this would be the text-based response to a user prompt. For LVMs, it is the image(s) generated. Copyright protects original works of authorship from unauthorized reproduction or display by those who are not the copyright owner.[1] These rights vest in the author the moment the work is created, though a copyright owner obtains valuable additional protections by filing a registration with the U.S. Copyright Office. To best examine the application of copyright law to AI generally, the legal landscape can be viewed in light of three parties with potential copyright interests in AI-related works.

Parties With Potential Copyright Interests

AI Developers

AI developers may have several copyright interests related to their models. First and perhaps most settled, developers may have copyright in the software of the AI model itself. While it is established that developers can obtain copyright protection over computer code, copyright’s merger and useful article doctrines complicate matters when it comes to protecting utilitarian works like software. However, licensing plays a role as well, and a developer’s ability to claim copyright over his model can be frustrated if he used an open-source software as the base for the program. Open-source code is a software package offered freely to the public under a general use license by its developer to incentivize innovation. While these licenses allow for use, it does not immediately mean the developer has relinquished his copyright. In fact, the opposite is true. The existence of a license implies the assertion of an otherwise protected right, though many open-source licenses dictate that platforms developed using the open-source code are the exclusive IP of the platform’s developer. While enforcement is unlikely unless the developer uses open-source software in an unethical manner, AI developers should be wary of their rights to the AI models they develop.

Further, AI developers have already attempted to claim copyright over the outputs generated by their AI models. In 2023, Stephen Thaler attempted to register an image (titled “A Recent Entrance to Paradise”). The image was created using Thaler’s “Creativity Machine,” a generative AI model, and the Copyright Office refused to register the image. In Thaler v. Perlmutter, the U.S. District Court for the District of Columbia agreed with the U.S.C.O., explaining that copyright authors must be human.[2] The case is currently pending before the U.S. Court of Appeals for the D.C. Circuit.[3] Some have taken this to mean that AI-generated works are not the product of human authorship. This is not the case. The decision noted that Thaler, in his copyright application, listed the AI as the author and himself as the owner of the copyright through the work-for-hire doctrine. Thaler claimed that, because he was the creator of the AI and prompt, he was essentially the AI’s “employer” and entitled to works that he argued should otherwise belong to the model itself As such, copyright in AI-generated outputs remains an unsettled issue. Nonetheless, in cases where the developer is not the prompter, the developer likely does not exert sufficient control over the creative process to claim the work as his own.

Authors With Pre-Existing Works

The second interested party consists of authors of pre-existing works that are used to train the AI models. This raises an issue as to whether a developer’s training of an AI on data sets (which often include millions of unfiltered data points) constitutes an unauthorized reproduction of copyrighted works contained therein. Since 2022, nationwide copyright litigation has erupted, with various authors suing AI developers for copyright infringement during the model training process.[4] Late in 2023, a federal court in the Southern District of New York became the battleground for a set of juggernaut AI-copyright cases. Filed on December 27, 2023, The New York Times Co. v. Microsoft Corp., et al. embodies the essence of the conflict and raises allegations that generative AI models facilitate mass copyright infringement– in that case, the mass reproduction of the Times’ journalism, among other data, in training OpenAI’s GPT models.[5] Daily News, LP, et al. v. Microsoft Corp., et al., filed in April 2024 in the same New York district court, is nearly identical.[6] In June 2024, Microsoft and the other OpenAI defendants filed a motion to consolidate both cases into a single, eighteen party dispute. Both cases ask more questions than are likely to be answered. However, deciding whether the use of copyrighted works to train AI models constitutes infringement, fair use, or warrants some compulsory license structure is the threshold for applying copyright law to increasingly autonomous technologies.

That is not to say that AI developers are doomed. Many argue that training AI models constitutes a fair use of copyrighted works, in part because the use is not one that usurps the copyright owner’s intended market. Whether or not AI training falls under copyright’s fair use protections will depend on how courts square the scope of “transformative” uses in relation to two Supreme Court decisions in Google LLC v. Oracle Am., Inc. (2021) and Andy Warhol Found. for the Visual Arts v. Goldsmith (2023).[7] In Google, the use of Oracle’s API software packages by Google to develop the Android platform was considered a transformative use, but, in Warhol, the use of Goldsmith’s photograph of Prince was not. Both involve the direct copying of substantive elements of the underlying work, but they reach different results. One major distinction concerns how much weight should be given to “commercial uses” in deciding the boundary between infringement and fair use.

Prompt Engineers

The third and final category of interested parties is the prompt engineer or prompter. These are the people who use generative AI services like ChatGPT and Midjourney to create writings or images. The precise question here is one that the court avoided in Thaler: is a prompter the author of a work generated by AI? This question remains unanswered, and the Copyright Office has, perhaps too hastily, begun to answer the question. In February 2023, the U.S.C.O. refused to register the individual images in the graphic novel, Zarya of the Dawn, because the images were created by AI.[8] It did, however, allow for the compilation of the images as a graphic novel to be protected as its own work. Nonetheless, the Copyright Office seemed to recognize that this blanket refusal was problematic and issued a request for commentary on the subject in August 2023. The Copyright Committee of the American Intellectual Property Law Association (AIPLA) drafted commentary addressing the application of copyright law to AI in October 2023.[9] In the commentary, the AIPLA addressed the importance of flexibility in determining what constitutes a “human creation” for the purposes of copyright authorship. A blanket rejection may raise questions about findings of authorship in other mediums that attenuate the creator’s control over a work’s moment of fixation (i.e., certain forms of photography use automatic exposures and slow shutter speeds to allow many random movements to be captured in a single frame, leaving the photographer with limited control over her subject). As of this post, the question of authorship in AI outputs remains unanswered, but affording some level of copyright protection to prompters for their generations could further the economic and creation incentives that form copyright law’s constitutional purpose.

However, prompters should be wary of the outputs they generate because the ability for an output to infringe an existing work is present. For example, if I ask Midjourney to create an image of “an old man standing on top of the Empire State Building dressed as Spider-Man,” the use of the Spiderman design may constitute an infringement of Marvel, Sony, or Disney’s copyright, not only in the suit design but the character itself. The use of AI does not diminish the rights of existing copyright owners, and prompt engineers should take care to ensure they are not infringing the works of others.

Ultimately, the advent of generative AI has raised many concerns regarding intellectual property. Whether copyright protection exists for generative AI outputs and whether the ingestion process constitutes fair use or infringement per se are vital questions, the answers to which will become the sine qua non of the relationship between IP and AI. The landscape is evolving, and the need for developers, prompters, and existing authors to seek legal advice will only become more pressing as litigation escalates. Yet, it seems inevitable that some form of copyright protection, however thin, will find itself married to works created with the assistance of AI. From copyright concerns to questions regarding data privacy and informational security, consulting with an intellectual property attorney regarding the use, integration, or development of generative AI models is more important than ever.

[1] See 17 U.S.C. §§102, 106.
[2] See Thaler v. Perlmutter, 2023 U.S. Dist. LEXIS 145823 (D.D.C. 2023).
[3] See Thaler v. Perlmutter, No. 0:2023cv05233 (D.C. Cir.) (filed Oct. 18, 2023).
[4] See generally Doe 1 et al v. Github, et al., No. 4:22-cv-06823, 2023 WL 3449131 (N.D. Cal. Nov. 13, 2022); Andersen v. Stability AI Ltd., No. 3:23-cv-00201, 2023 WL 7132064 (N.D. Cal. Jan. 13, 2023) (dismissing many of the plaintiffs’ claims); Getty Images (US) Inc. v. Stability AI, Inc., No. 1:23-cv-00135 (D. Del. filed Feb. 3, 2023); Kadrey and Silverman, et al. v. Meta Platforms, Inc., No. 3:23-cv-03417 (N.D. Cal. filed July 7, 2023); Silverman v. OpenAI, Inc., No. 3:23-cv-03416 (N.D. Cal. filed July 7, 2023); Authors Guild, et al. v. OpenAI, Inc., et al., no. 1:23-cv-08292 (S.D.N.Y. filed Sept. 18 2023) (consolidated Feb. 6, 2024); The New York Times Co. v. Microsoft Corp., et al., 1:23-cv-11195-SHS (S.D.N.Y. filed Dec. 27, 2023); Daily News, LP, et al. v. Microsoft Corp., et al., 1:24-cv-03285-SHS (S.D.N.Y. filed April 30, 2024).
[5] See The New York Times Co. v. Microsoft Corp., et al., 1:23-cv-11195-SHS (S.D.N.Y. filed Dec. 27, 2023).
[6] See Daily News, LP, et al. v. Microsoft Corp., et al., 1:24-cv-03285-SHS (S.D.N.Y. filed April 30, 2024).
[7] Compare Google LLC v. Oracle Am., Inc., 593 U.S. 1 (2021) with Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508 (2023).
[8] See Zarya of the Dawn (Registration # VAu001480196), U.S. Copyright Off. (Feb. 21, 2023), https://www.copyright.gov/docs/zarya-of-the-dawn.pdf.
[9] Jake Bryant, the author of this article, was a member of the AIPLA’s Generative AI taskforce that drafted the commentary.

Additional Resources

Practices & Specialties

Intellectual Property & Technology