Meta plans to use Facebook data to train AI

Meta plans to use Facebook data to train AI — this could end badly

It comes as no real surprise that Meta uses personal data from users for a variety of functions, including ad sales. But according to a report by Bloomberg, the company is also planning to leverage that personal data to train its Meta AI tool.

During an earnings call this week, Meta CEO Mark Zuckerberg announced the decision to train AI on user data. Zuckerberg claimed, "On Facebook and Instagram, there are hundreds of billions of publicly shared images and tens of billions of public videos, which we estimate is greater than the Common Crawl dataset and people share large numbers of public text posts in comments across our services as well."

A large set of training data does make a smarter AI, and Meta does sit atop a massive mountain of user data. However, the plan to train Meta AI on that data set raises various ethical, regulatory, and copyright issues.

The potential conflicts are massive

However, there are some major privacy concerns with companies training AI on user data. Facebook has 3 billion users and Instagram has an additional (though potentially overlapping) 1.5 billion. That is a massive subset of the internet-dwelling public. Which, sure, may give Meta a large data pool. But that's also 4.5 billion people whose private data is being used to build an artificial intelligence.

That's not only an ethical concern, but also a major compliance issue with global data protection laws.

OpenAI, Microsoft, Google, Adobe and others have been rightfully criticized for using copyrighted data to train their AI models. Facebook is not just a place for individual users but also companies, theater troupes, artists, musicians, writers, and content creators.

So in addition to violating the privacy of its users for the sake of greed once again, Meta stands to face legal action on the Copyright and data protection fronts as well.

Nevermind the veritable hornets nest of toxicity and bias contained in the Meta dataset. Facebook is nearly synonymous with misinformation at this point, and Facebook's content moderation software is still nowhere near good enough at blocking hate speech or harmful conspiracy theories. And with Zuckerberg claiming such a massive data pool, that may also include Meta's historical archive which will include all of the information that has since been blocked or removed by recent changes to Facebook's content moderation systems.

Training AI on that unfiltered data will certainly lead to the AI spitting back hate speech, because the internet is a place full of absolute filth and AI doesn't have the capacity to ignore that information. It will only replicate it.

What this means for users

If you're on Facebook and Instagram, your data has already been sold. And as OpenAI proved, your data can be used even if the AI company doesn't also own that data.

Sure, you can delete your social media accounts, but web caches do exist, and simply deleting an account doesn't get rid of all the stored data on Meta's back end.

The best way to protest Meta's AI policies is to not use their AI tool and join any possible class action or copyright infringement suits as may be applicable. At this point, there are so many other AI alternatives, ignoring the Meta version should be simple enough.

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here