AI has arguably become the biggest – and most contentious – topic in the world of art and design in recent years. For every impressive example of prompt-generated images or video, there's a question of ethics and copyright, not to mention the perceived existential threat to creative jobs.
Many of the biggest brands creating AI tech have been eager to emphasise their ethical credentials, with Adobe Firefly, for example, committing to transparency and authenticity by only training its models on commercially available content. But according to a new exposé, Google appears to consider the entire of YouTube to be fair game.
The New York Times claims that OpenAI (of ChatGPT fame) trained its Whisper speech recognition tool on millions of YouTube videos, with the transcripts used to train ChatGPT 4.
The most damning claim, though, is that Google was aware of the practice, but did not intervene, despite it contravening YouTube's own policies on unauthorised content scraping. This, the report claims, is because Google was already training its own AI, Gemini, on YouTube videos. Matt Bryant, a spokesperson for Google, told the New York Times Google did not know OpenAI was training ChatGPT on YouTube videos, but the report suggests several people at Google were aware of it, and did not take action because the company itself was doing the same thing.
The suggestion that two major AI players have trained their AI models on millions of YouTube videos will do nothing to allay the fears of those who AI is committing mass copyright infringement.
The report echoes the outcry over recently leaked document which revealed that MidJourney was trained on the work of over 16,000 artists. But with both OpenAI and Google implicated in this new report, we could be looking at the most significant AI controversy yet.