What will AI do to employment? It is, after “will it kill us all?”, the most important question about the technology, and it’s remarkably hard to pin down – even as the frontier moves from science fiction to reality.
At one end of the spectrum is the slightly Pollyannaish claim that new technology simply creates new jobs; at the other, fears of businesses replacing entire workforces with AI tools. Sometimes, the dispute is less about end state and more about speed of the transition: an upheaval completed in a few years is destructive for those caught in the middle of it, in a way that one which takes two decades may be survivable.
Even analogies to the past are less clear than we might like. The internal combustion engine put an end to working horses – eventually. But the steam engine did the opposite, vastly increasing the number of pack animals employed in the UK. Why? Because the railways led to a boom in goods being shipped around the country, but couldn’t complete the delivery from depot to doorstep. Horses were needed to do the things the steam engine couldn’t.
Until they weren’t.
Steam power and the internal combustion engine are examples of general purpose technologies, breakthroughs that reshape the entire structure of society. There haven’t been many, even if you start the count at writing – or, before that, at fire itself. It is, I believe, a complete coincidence that the term “generative pretrained transformer” has the same initials, and so GPTs appear to be a GPT.
It’s not the jobs, stupid
People aren’t horses [citation needed]. It seems implausible that AI technology will ever be able to do absolutely everything a human can do, because some of what a human can do is be a human, an inconveniently circular claim but an important one. Horses still run in horse races, because if you replace a horse with a car it’s not a horse race [citation needed]; people will still provide those services which, for whatever reason, people want people to provide. As culture warps around the rise of AI, what some of those services are might surprise us. AI in healthcare is underrated, for instance, because for a lot of people “the human touch” is a bad thing: it’s the doctor who you worry is judging your drinking or the therapist who you lie to because you want them to like you.
As a result, lots of people like to think not about jobs, but about “tasks”. Take a job, define it in terms of the tasks it involves, and ask whether an AI can do those. That way, you identify a few that are at risk of complete cannibalisation, a few that are perfectly safe and a large middle that is going to be “affected” by AI, however that shakes out.
It’s worth flagging the obvious: that approach is mechanically going to result in a large number for jobs “affected” and a small number for jobs “destroyed”. (Even the most AI-impacted job likely has some tasks that AI finds hard.) That might be why it is a methodology pioneered by OpenAI. In a 2023 paper, researchers affiliated with the lab estimated: “That 80 per cent of workers belong to an occupation with at least 10 per cent of its tasks exposed to LLMs, while 19 per cent of workers are in an occupation where over half of its tasks are labeled as exposed.”
The report argued that between 15 and 86 occupations were “fully exposed”, including mathematicians, legal secretaries and … journalists.
I’m still here. But a year on, the idea is back in the news thanks to a paper from the Tony Blair Institute (TBI). The mega thinktank was powerful and influential even before the landslide Labour victory two weeks ago; now, it’s seen as one of the architects of Starmerite thought. And it thinks the public sector is ripe for AI disruption. From the institute’s paper, The Potential Impact of AI on the Public-Sector Workforce (pdf):
More than 40 per cent of tasks performed by public-sector workers could be partly automated by a combination of AI-based software, for example machine-learning models and large-language models, and AI-enabled hardware, ranging from AI-enabled sensors to advanced robotics.
The government will need to invest in AI technology, upgrade its data systems, train its workforce to use the new tools and cover any redundancy costs associated with early exits from the workforce. Under an ambitious rollout scheme, we estimate these costs equate to £4bn per year on average over this parliamentary term.
For the last couple of weeks, TechScape has been casting its eye over the new government’s approach to AI. Tomorrow, we’ll find out quite a bit more, with an AI bill expected in the King’s speech. The TBI paper gives us one anchoring point to look for: will investment in the transformation come anywhere close to £4bn a year? A lot can be done for free, but a lot more can be done with substantial money. The spend pays off, more than 9:1, in the institute’s estimates; but a £20bn bill is hard to smuggle through parliament without questions.
AI wonks
Over the weekend, the report had a second wave of interest, after critics took issue with the methodology. From 404 Media:
The problem with this prediction, which was picked up by Politico, TechRadar, Forbes and others, is that it was made by ChatGPT after the authors of the paper admitted that making a prediction based on interviews with experts would be too hard. Basically, the finding that AI could replace humans at their jobs and radically change how the government works was itself largely made by AI.
“There is no validation in this method that a language model is good at working out what is, in principle, able to be automated,” Michael Veale, associate professor at University College London, told me. “Automation is a complex phenomenon – in government it involves multiple levels of administration, shared standards, changing legislation, very low acceptable cost of failure. These tasks do not exist in isolation, but are part of a much broader set of practices and routines.”
Breaking down jobs into tasks has already been done, with a vast database created by the US Department of Labor. But with 20,000 such tasks, describing which are exposed to AI is heavy work. In OpenAI’s similar paper, “the authors personally labeled a large sample of tasks and DWAs and enlisted experienced human annotators who have reviewed GPT-3, GPT-3.5 and GPT-4 outputs as part of OpenAI’s alignment work”, but they also enlisted the then-new GPT-4 to do the same task, and found between 60 and 80 per cent agreement between the robot and humans.
The TBI paper skipped the experts and just put its questions to AI to answer. After a flurry of attention, the paper was quietly updated with an eight-page appendix defending the choice:
Clearly there are trade-offs between the different methods. None is perfect. Greater reliance on human judgment can limit the analysis to a broader categorisation of tasks with less specificity over time savings. On the other hand, pursuing a more detailed categorisation typically involves relying more on AI to support the assessment.
But dropping the human labellers wasn’t the only difference between OpenAI’s paper and the TBI follow-up. The wonks also used a vastly more detailed prompt, encouraging the AI system to consider, in detail, the nature of the cognitive and physical labour involved in a given task, before asking whether AI can do a task, and then offering follow-up questions to ensure that only those tasks practically automatable are actually counted.
This is “prompt engineering” in action, with the AI system being encouraged to take a step-by-step reasoning approach to improve its answers. It’s also an example of what’s called “overhang”: the researchers used the same GPT-4 model in both instances, but by getting better at working with it, the TBI team were able to get better work from it.
As the dust settles, the new appendix might be the most important part of the whole paper. The top level findings are probably, broadly, true, because GPT-4 is very good at spitting out text that is probably, broadly, true. Doubtless, if someone had the time to dig through the many thousands of pages of text it produced in labelling those tens of thousands of tasks, there will be inaccuracies, cliches and straight-up hallucinations. But at the scale of the study, they don’t matter.
And neither do the findings. “Some but not all public sector tasks could be automated by an AI” is a fairly easy claim. Putting a number on it helps argue for investment, but you’d be a fool to bet that “40 per cent” is any more accurate than 50 or 30 per cent.
Instead, the paper is teaching by doing. You want to know how AI will affect government and politics? Well, there it is in action. A paper was produced at a fraction of the cost it would once have taken, but presented to an audience where the very method of its creation casts doubt on its findings.
Rinse and repeat for a further 8,000 tasks, and you’re quite a lot closer to understanding the impact of AI on jobs – and to seeing that it’s going to be anything but a clean transition.