To me, artificial intelligence is a lot like magnets: I have no idea how they work. But I do understand, in a very general sense, that AI is not actually intelligent. It's just data, collected on a massive scale, algorithmically digested, and spit out in conversational tones designed to make us think that the machine is "smart."
The popular versions of these systems, like ChatGPT, live and die based on the amount of data they can harvest, which essentially means they're reliant on you. And in case there's any doubt about what "you" means in this particular context, Google (via Techspot) has updated its privacy policy to explicitly state that pretty much anything you say or do online can be scooped up and used to train its AI models.
Naturally, Google collects data from your online activity, like the stuff you search for, the videos you watch, the things you buy, and the people you talk to, and the location data accessed through your Android mobile device. But "in some circumstances," it also collects information from "publicly accessible sources": If your name appears in a local newspaper article, for instance, Google may index the article and then share it with people searching for your name.
That in itself isn't new: What's changed, as can be seen on Google's policy updates page, is how Google says it can use the information it picks up from those public sources. Previously, the policy stated that publicly available data could be used "to help train Google’s language models and build features like Google Translate." The latest update broadens the policy considerably: "We may collect information that’s publicly available online or from other public sources to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities."
Bard is essentially Google's answer to ChatGPT, announced earlier this year, and much like other AI models it hasn't been entirely smooth sailing. In April, for instance, a report claimed that several Google employees had urged the company not to roll out Bard because the information it provided in response to queries was "worse than useless" and effectively made the chatbot a "pathological liar."
More data should, in theory at least, lead to better results for Google's bots. But updated privacy policy or not, the legal status of this behaviour has not been clearly established. OpenAI is facing multiple lawsuits over the way it harvests and uses data to train ChatGPT: Policies like the one recently implemented by Google might seem to make some of it fair game but, but as The Washington Post reported, AI models will hoover up pretty much anything from Wikipedia pages to news posts and individual tweets, a habit that a growing number of people take issue with.
And not all of the material in question is in fact fair game: Authors Mona Awad and Paul Tremblay recently filed their own lawsuit against OpenAI, alleging that ChatGPT violated copyright laws by using their works to train its AI model without permission.
I've reached out to Google for more information on its reasons for changing its privacy policies, and will update if I receive a reply.