Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Tom’s Guide
Tom’s Guide
Technology
Alan Martin

Google strikes $60m deal with Reddit for AI training data — what you need to know

Google headquarters in California.

Reddit spent the latter half of 2023 considering whether to block the Google and Bing search engines from indexing posts on the site. The decision, according to The Washington Post , was in order to prevent the unauthorized and uncompensated use of its posts to train AI. 

Now Reddit has announced it's reached a deal with Google that will, among other things, give the company access to the Reddit Data API “to improve its products and services” which includes “more efficient ways to train models”. In Google’s words, access to said API will grant the company “real-time, structured, unique content from their large and dynamic platform.” 

The deal, which Bloomberg previously suggested would be “worth about $60 million on an annualized basis”, doesn’t stop there. As part of the agreement, Reddit will have access to Google’s Vertex AI service which should improve internal search results, and it will also allow for “Reddit content to be displayed across Google products.” 

Google says this will ensure “more content-forward displays of Reddit information that will make our products more helpful for our users and make it easier to participate in Reddit communities and conversations.” Given the number of people who affix the word “reddit” to searches to surface genuine user-generated insights, that could be a very good thing to the average Google user.

But for Google, the real prize is undoubtedly the vast treasure trove of training data, which will theoretically make its generative AI appear more human, thanks to the posts and comments written by millions of real people every day.

For Google, the real prize is undoubtedly the vast treasure trove of training data, which will theoretically make its generative AI appear more human.

But scale isn’t everything, and in some ways Reddit is an imperfect sample for training artificial intelligence when compared to literature or magazines. Grammar is faster and looser, there’s a lot of memes and inside jokes, it’s full of information that’s just plain wrong and it's predominantly male.

(Image credit: Shutterstock)

By contrast, Apple has reportedly sought multi-million dollar deals with publishers in order to train on their more formal and factually accurate magazines and newspapers. Though obviously this has its disadvantages too, concentrating on another small part of the human experience at the expense of how everyday people communicate — something Reddit is undoubtedly better at demonstrating.

Expect more of such deals to be made public over the next few years, because people are realizing that AI means big money and that training data can’t be absorbed free of charge without consequences. In the last year, Open AI, Meta and Stability AI have all been hit by lawsuits from authors who claim that their books were used for training without permission or compensation.

More from Tom's Guide

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.