Once again, ChatGPT is under scrutiny in Europe, and the AI giant continues to jeopardize EU citizens' privacy and data accuracy rights under GDPR.
This is the main takeaway that has emerged from the preliminary findings of the EU's year-long ChatGPT Taskforce investigation. The EU Data Protection Board (EDPB)—the body that unites Europe's national privacy watchdogs—created the taskforce in April last year after Italy's temporary ban of the app on privacy grounds caused a surge in the use of the best VPN services across the region.
The group analyzed several problematic aspects of the popular AI chatbot, especially around the legality of its web scraping practices and data accuracy. In fact, besides data collection issues, ChatGPT's hallucination tendency has also been hit by complaints from some national Data Privacy Authorities (DPAs) lately.
The EU investigation is just at the beginning, so OpenAI's practices are unlikely to become more GPR-friendly anytime soon. However, this may be a first step—although light—towards a better privacy framework for large language models (LLMs) tools to follow in Europe.
ChatGPT's web scraping and accuracy flaws
After Italy temporarily blocked ChatGPT for improperly collecting and storing data in March 2023, other EU countries including France, Germany, and Ireland began investigating the matter. There have been many complaints filed, but not much enforcement so far.
The Taskforce comes as a way to promote cooperation between national DPAs investigating the OpenAI's chatbot. However, while providing preliminary views on contested aspects among DPAs, the report "does not prejudge the analysis that will be made by each DPA in their respective, ongoing investigation," EDPB explained.
The main contested issue is how ChatGPT collects, retains, and uses EU citizens' data. OpenAI scrapes vast scales of data from the web without asking for consent. Via chatbot prompts, users can feed the system with highly sensitive data which requires better protection. There's also a lack of transparency on how the company ultimately processes this data to train its AI models.
GDPR requires a legal basis for processing personal data—in this case, either asking for the individual's consent or having a "legitimate interest" in doing so. OpenAI cannot ask for consent to scrape your information online. That's why, after the Italian case, the company is largely playing the latter card.
While "the assessment of the lawfulness is still subject to pending investigations," the report pointed out that the legitimate interest clause might be legally sound if some technical measures are used. These include avoiding certain data categories or sources (such as public social media profiles). The company should also be able to delete or anonymize personal data, respecting EU citizens' rights such as the right to be forgotten.
Yet, according to AI and privacy expert Luiza Jarovsky (see tweet below): "Legitimate interest has been totally distorted here."
Under Article 14.5 (b) of the GDPR, she explained, "the controller shall take appropriate measures to protect the data subject’s rights and freedoms and legitimate interests, including making the information publicly available." However, ChatGPT's data is anything but publicly available.
"Either the EDPB says that legitimate interest works differently for OpenAI and other AI companies relying on scraping to train AI (and explain why), or they require them to comply with legitimate interest according to the GDPR," said Jarovsky.
🚨 BREAKING: The EDPB has just published its ChatGPT Taskforce Report, and there is a big 🐘 ELEPHANT IN THE ROOM 🐘. Read this:➡ On web scraping and "collection of training data, pre-processing of the data and training," the report recognizes that OpenAI relies on legitimate… pic.twitter.com/QEiyqhuDqzMay 24, 2024
Data accuracy is the next big contentious point. We already discussed how ChatGPT and similar AI chatbots will probably never stop making stuff up. "AI hallucinations" not only can fuel misinformation online, but they also go against EU privacy laws.
Under Article 5 of the GDPR, all online information about individuals in the EU must be accurate. Article 16 requires all inaccurate or false data to be rectified. Article 15 gives then Europeans "the right to access," requiring companies to show which data they hold on individuals and what the sources are. Again, OpenAI doesn't meet any of these criteria.
"As a matter of fact, due to the probabilistic nature of the system, the current training approach leads to a model which may also produce biased or made-up outputs", the report reads.
"Although the measures taken in order to comply with the transparency principle are beneficial to avoid misinterpretation of the output of ChatGPT, they are not sufficient to comply with the data accuracy principle."
Experts suggest that for OpenAI to act in line with the transparency principle under Article 5, it should explicitly inform the user that the generated text may be biased or made up. Likewise, they also recommend informing users that information shared via chatbot prompts may be used for training purposes.
The report ultimately stresses the importance of EU citizens' rights under GDPR, like the right to delete or rectify your data (known as the "right to be forgotten") or the right to obtain information on how your data is being processed.
On this crucial point, though, the Taskforce does not give any practical recommendations, and only gives broad advice. OpenAI should implement "appropriate measures" and integrate "the necessary safeguards" to meet the requirements of the GDPR and protect the rights of data subjects, experts said.
Are they really expecting OpenAI to come up with a solution? It looks likely.