Get all your news in one place.

100’s of premium titles.
One app.

Start reading

Get all your news in one place.

100’s of premium titles. One news app.

Start reading

Fortune

David Meyer

Generative A.I.’s copyright problem grows as lawsuits pile onto OpenAI, Google, and Meta

OpenAI Google Meta Jeremy Kahn Dario Amodei Matthew Butterick Sarah Silverman Microsoft

Comedian Sarah Silverman performs at the Ryman Auditorium on March 22, 2023 in Nashville, Tennessee. (Credit: Jason Kempin—Getty Images)

Hi, and welcome to Eye on A.I. David Meyer here in Berlin, filling in for Jeremy Kahn, who is over at our Brainstorm Tech conference in Utah (more on that later).

Copyright-related lawsuits have been an occasional feature of the generative A.I. boom since it began—a class action over Microsoft’s GitHub Copilot last November; artists and Getty Images suing Stability AI at the start of this year—but the past couple weeks have seen a real flurry of activity.

The highest-profile suits come courtesy of star comedian Sarah Silverman, who (along with fellow authors Chris Golden and Richard Kadrey) last Friday went after both Meta and OpenAI over the training of the companies’ large language models (LLMs) on their copyrighted books. The authors are represented by lawyers Joseph Saveri and Matthew Butterick, who also organized the previously mentioned class actions (except for the Getty one), and who launched a similar class action against OpenAI a couple of weeks ago on behalf of authors Mona Awad and Paul Tremblay.

Here are Saveri and Butterick on the Meta and OpenAI suits: “Much of the material in the training datasets used by OpenAI and Meta comes from copyrighted works—including books written by Plaintiffs—that were copied by OpenAI and Meta without consent, without credit, and without compensation.” The fact that the LLMs are able to summarize text from the books is, the suits allege, evidence of this training.

Meanwhile, the Clarkson Law Firm filed a pair of class actions against OpenAI (at the end of June) and Google and DeepMind (yesterday) on behalf of anonymized individuals. Though the gist is similar to the authors’ suits, it’s safe to say the claims here are extremely, er, broad. Here’s how the Google/DeepMind suit begins: “It has very recently come to light that Google has been secretly stealing everything ever created and shared on the internet by hundreds of millions of Americans. Google has taken all our personal and professional information, our creative and copywritten works, our photographs, and even our emails—virtually the entirety of our digital footprint—and is using it to build commercial artificial intelligence products…This mass theft of personal information has stunned internet users around the world.”

Although the suit claims Google “harvested this data in secret, without notice or consent from anyone,” the company says it has been “clear for years” that its A.I. gets trained on publicly available data, and the suit is baseless.

It’s becoming clear that, if generative A.I. has an Achilles’ heel (that isn’t its tendency to “hallucinate”), it’s copyright. Some of these suits may be more plausible than others, and justice needs to take its course, but there does at least seem to be a strong argument for saying generative A.I. relies on the exploitation of stuff that people have created, and that the business models that accompany the technology do not allow for those people to be compensated for this absorption and regurgitation.

The copyright issue may also stymie A.I. companies’ prospects in Europe. As Jeremy wrote recently, none of the current foundation models can comply with the EU’s draft A.I. Act, with a common problem being their lack of transparency around the copyrighted data on which they were trained.

OpenAI, which had avoided copyright-infringement suits until late June, appears to be scrambling to appease copyright holders. Last week, it tweeted that ChatGPT’s Browse beta, which connects the chatbot to the internet, was sometimes reproducing the full text of web pages. “We are disabling Browse while we fix this—want to do right by content owners,” the company said.

It may also be relevant to note that, at the Fortune Brainstorm Tech conference this week in Deer Valley, Utah, Microsoft search and A.I. vice president Jordi Ribas told Jeremy that the A.I.-ified Bing is actually sending more—not less—traffic to publishers, “because people are engaging more.” He continued: “To really be successful, we need the publisher and the advertising community to be successful. That’s how the ecosystem works.”

Speaking of Brainstorm Tech, Jeremy also interviewed Anthropic CEO Dario Amodei, who laid out his three-tier system for assessing A.I.’s risks. You can read the full details of the session, but Amodei’s summation is this: “My guess is that things will go really well. But there’s a risk, maybe 10% or 20%, that this will go wrong, and it’s incumbent on us to make sure that doesn’t happen.”

More from the conference here, and more A.I. tidbits below.

David Meyer
@superglaze
david.meyer@fortune.com

Read news from 100’s of titles, curated specifically for you.

Already a member? Sign in here