Hi there. It’s Rachyl Jones, the tech reporting fellow. OpenAI has a little weasel scurrying around the internet collecting up-to-date information to train its large language model, ChatGPT. But it has run into a few locked doors recently.
A series of news organizations—including the New York Times, CNN, Reuters and the Chicago Tribune—have blocked the ability for ChatGPT to access their content, the Guardian reported on Thursday. The Australian Broadcasting Corporation and Australian Community Media, which owns 100 local publications, have also sealed their websites, according to the Guardian.
The weasel I’m referring to is actually a web-crawling software called GPTBot, which OpenAI launched earlier this month. “Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety,” OpenAI says on its website.
When ChatGPT launched, it had only been trained on information up until September 2021. Stuck in the year Joe Biden became president and the U.S. military exited Afghanistan, ChatGPT had to learn new things to remain relevant in the heated artificial intelligence race. OpenAI has been working to address this issue, from temporarily launching a Browse with Bing feature to setting its GPTBot free to scour the internet. But “do not enter” signs from major media organizations—which often have the most recent and relevant news—could present a problem for the chatbot.
The terms of service for the Times, Reuters, and the Tribune all explicitly state users may not scrape their data. The Times’ terms specifically say its content cannot be used to train A.I. programs. Publishers are in the business of selling information, whether through providing subscriptions or showing advertisements. Either way, they need people to visit their websites to make money. Freely providing their site’s content to chatbots—which might negate the need to visit publishers’ websites—could hurt their revenue. News outlets have already been struggling to revise their business models after the rise of social media drove advertising dollars away from traditional media. The knife is in publishers’ backs, and they’re trying to keep A.I. from twisting it.
By blocking GPTBot, these media organizations could be pressuring OpenAI to pay for access. Last month, OpenAI struck a deal with the Associated Press to license its news stories for A.I. training purposes. It is unclear how much OpenAI paid, but it's something others might be interested in. Google, which is also scraping publishers’ sites to train its large language model, could make similar deals with news publishers that have locked it out.
Here’s what else is going on in tech today.
Rachyl Jones
Want to send thoughts or suggestions to Data Sheet? Drop a line here.