Everyone is suddenly abuzz about AI. Since OpenAI released ChatGPT, the idea of artificial general intelligence seems like a plausible reality. Whether you believe AI will benefit the world or bring about disaster, it’s hard to deny how compelling ChatGPT/GPT-4 is when it comes to the quality of its answers and the range of tasks it can perform.
This has sparked an “AI arms race”, with the largest tech companies all desperately investing in their own AI capabilities after being beaten to the punch by OpenAI.
Permissionless data
ChatGPT was trained using a vast collection of written material found online — such as a collection of some 7000 unpublished books; WebText; millions of “high quality” outbound links from Reddit; CommonCrawl, a web archive depository containing petabytes of web data, Wikipedia entries and more.
ChatGPT also used human trainers, including “labellers” and feedback managers who constantly refined its language models. Once the software was further developed, OpenAI released it to the public for more feedback.
This model of using a vast collection of free, public data points to monetise products is a favoured one by Silicon Valley. The big tech model of using human data, labour and output without permission, compensation or recourse has meant that a small handful of companies become the gatekeepers of new technology.
But even some Big Tech insiders are starting to get worried. Geoffrey Hinton, dubbed “the godfather of AI”, recently resigned from Google, admitting concerns about the speed and development of the technology, calling the current chatbots “quite scary”.
He follows other Google employees who have raised the alarm on AI, including Timnit Gebru and Margaret Mitchell, who both spearheaded Google’s AI ethics team. Despite these warnings, Google is ploughing ahead.
Human toll
OpenAI trained ChatGPT on publicly available datasets, benefiting from the collective work of millions. Google’s search engine indexes all publicly available websites to create a directory of information that it profits from. Its other applications — like Maps, Gmail and Google Docs — combine datasets collected from millions of users to create marketable user profiles to power its advertising business. Facebook harvests all the data around our habits, likes, dislikes and personal connections to micro-target advertising.
Not to mention considerations around copyright, intellectual property and moral rights. While the current datasets may not have copyright rules applied, when will this type of appropriation end?
AI models require ever larger and more diverse datasets to truly capture the nuances and subtleties of expression. One can imagine copyrighted works of literature, art and music can only be next. Already Google is advocating for breaking open copyright law in Australia to better accommodate its AI systems.
AI might seem like magic, but in reality, it benefits from the work of many humans, often in an exploitative and extractive manner.
OpenAI outsourced its labelling function to help make ChatGPT less toxic by hiring workers in Kenya who were paid less than $2 per hour. Facebook has been notorious for this similar practice for years, hiring poorly paid moderators to sift through harmful content, resulting in significant mental health issues and PTSD.
An ecosystem of labour
In her book Atlas of AI, scholar Kate Crawford exposes how extractive the AI industry is. In contrast to the images we associate with it — innovation, virtual (non-physical), machine-led (independent of humans) — AI is actually bodied, material, and reliant on an ecosystem that consumes raw materials and practices exploitative labour.
Crawford describes how AI requires rare resources like lithium for batteries, which are often found in developing countries and conflict zones, and latex from South-East Asia — with concerning environmental impacts. AI is also reliant on outsourced, manual data labelling and classification, usually done by people from poor countries paid a pittance, like Amazon’s Mechanical Turk (and the aforementioned Open AI labellers and Facebook moderators). Or it is reliant on indirect labour whereby humans don’t even realise they are contributing to training AI — like Google’s reCAPTCHA feature.
With the latest version of AI that’s now sweeping the globe, ChatGPT, we are in danger of once again reinforcing the myth of AI as something disembodied and free from human intervention, a unique and novel thing that genius tech bros have come up with in isolation.
Importantly, we are again in danger of allowing a small group of big tech companies to capitalise and profit from the latest technology built off the work of millions of people, without acknowledging the extractive nature of those initiatives.
So, if AI does take over our jobs, making our skills obsolete or severely disrupting our industries, we shouldn’t be surprised. We only have ourselves to blame for allowing these harmful big tech practices to continue unchallenged — or worse, for actively contributing to it.