What you need to know
- A new study suggests that more than 57% of the content available on the internet is generated content.
- AI tools like Copilot and ChatGPT depend on information from the internet for training, but the infiltration of AI-generated content into the internet limits their scope, leading to inaccurate responses and misinformation.
- If copyright law prohibits training AI models using copyrighted content, the responses generated using chatbots will likely worsen and become more inaccurate.
With the rapid adoption of generative AI, it's increasingly becoming difficult to tell what's real. From images and videos to text, AI tools are debatably at their peak and can generate sophisticated outputs based on prompts.
There's been a constant battle between publishers and the companies behind these AI tools over copyright infringement-related issues. While OpenAI CEO Sam Altman admits it's impossible to create tools like ChatGPT without copyrighted content, copyright law doesn't prohibit the use of the content to train AI models.
A new study published in Nature suggests 57% of content published online is AI-generated (via Forbes). Researchers from Cambridge and Oxford claim the increasing number of AI-generated content and the overreliance of AI tools on the same content can only lead to one result — low-quality responses to queries.
Per the study, the AI-generated responses to queries degraded in value and accuracy after every attempt. According to Dr. Ilia Shumailov from the University of Oxford:
“It is surprising how fast model collapse kicks in and how elusive it can be. At first, it affects minority data—data that is badly represented. It then affects diversity of the outputs and the variance reduces. Sometimes, you observe small improvement for the majority data, that hides away the degradation in performance on minority data. Model collapse can have serious consequences.”
According to the researchers, the degradation in the quality of responses by chatbots is a cyclical overdose of AI-generated content. As you may know, AI models depend on information on the internet for training. As such, if the information on the internet is AI-generated and inaccurate, the training exercise becomes ineffective, prompting the generation of wrong answers and misinformation.
AI chatbots are lying to themselves
The researchers decided to dig deeper in an attempt to uncover the root cause of the issue. Right off the bat, it can be attributed to an increase in AI-generated articles being published online without fact-checking. The team used a pre-trained AI-powered wiki to make its deductions. They trained the tool using its outputs. The team immediately noticed a decline in the quality of the information generated by the tool.
The study further highlights that the AI tool excluded rare dog breeds from its knowledge scope after repeated data sets, despite being trained on a wide library of information about dog breeds from the get-go.
To this end, the quality of search results will likely worsen with the prevalence of AI and the publishing of AI-generated content online.
🎒The best Back to School deals📝
- 🕹️Xbox Game Pass Ultimate (3-months) | $29.99 at CDKeys (Save $20!)
- 🎮Lenovo Legion Go (Z1 Extreme) | $599.99 at Best Buy (Save $100!)
- 🎧Sony WH1000XM5 ANC Headphones | $329.99 at Best Buy (Save $70!)
- 🕹️Starfield Premium Upgrade (Xbox & PC) | $27.69 at CDKeys (Save $7!)
- 📺LG UltraGear OLED Curved Monitor 39 | $996.99 at Amazon (Save $503!)
- 💻HP Victus 15.6 Laptop (RTX 4050) | $599 at Walmart (Save $380!)
- 🕹️God of War: Ragnarök (PC, Steam) | $52.09 at CDKeys (Save $8!)
- 💻Lenovo ThinkPad X1 Carbon | $1,481.48 at Lenovo (Save $1,368!)
- 🎧Bose QuietComfort ANC Headphones| $249.00 at Best Buy (Save $100!)
- 🎮 Seagate Xbox Series X|S Card (2TB) | $229.99 at Best Buy (Save $130!)
- 🕹️Hi-Fi RUSH (PC, Steam) | $8.99 at CDKeys (Save $21!)
- 🖱️Razer Basilisk V3 Wired Mouse | $46.99 at Best Buy (Save $23!)
- 🖥️Lenovo ThinkStation P3 (Core i5 vPro) | $879.00 at Lenovo (Save $880!)