AI + copyright = copywrong?

Gönül Aycı, PhD
2 min readJun 23, 2024

--

Generative AI (GenAI) models can write persuasive text, generate realistic images, and even compose music. Their secret? Data. Indeed, data is all you need! These models are trained by crawling on the Web (including paywalled articles), which raises risks and concerns about copyright, ethics, privacy, etc. In this post, let’s talk about copyright concerns and ongoing questions that came to mind when I read real-life incidents.

Dall-e generated image.

Past: The Authors Guild v Google

In 2005, the Authors Guild and the Association of American Publishers filed lawsuits against Google, claiming copyright infringement for scanning and digitizing books without obtaining explicit permission from authors and publishers. Despite these challenges, the U.S. District Court sided with Google in 2013, ruling that their project qualified as “fair use.” The Second Circuit Court of Appeals upheld this decision in 2015, affirming that Google’s work served the public interest and did not violate copyright laws [1].

Present: The New York Times v OpenAI

Jumping to the end of 2023, a new legal battle has emerged with The New York Times suing OpenAI. The lawsuit claims that OpenAI’s ChatGPT has memorized content from The New York Times articles without permission, citing 100 specific examples involving GPT-4 [2]. This case could become a turning point in how the legal system views the use of copyrighted material for training AI models. The verdict may significantly influence future practices in the AI industry. Naturally, this topic has stirred a lot of discussion and differing viewpoints [3].

The first example in the lawsuit document of The New York Times [2].

Four Ongoing Questions

  1. How do regulators define clear boundaries for fair use in the context of AI?
  2. Can the use of references mitigate or avoid copyright infringement concerns?
  3. Is financial compensation for data usage an adequate solution to resolve copyright disputes?
  4. If paying for data use is deemed sufficient, does this create a dead end for a little fish in a big pond who cannot afford these costs in a competitive landscape?

--

--

Gönül Aycı, PhD
Gönül Aycı, PhD

Written by Gönül Aycı, PhD

Write about topics that inspire me on ML and AI. A passionate advocate for women in technology. Pythonista

No responses yet