In all the frenzied discourse about large language models (LLMs) such as GPT-4 there is one point on which everyone seems to agree: these models are essentially stochastic parrots – namely, machines that are good at generating convincing sentences, but do not actually understand the meaning of the language they are processing. They have somehow “read” (that is, ingested) everything ever published in machine-readable form and create sentences word by word, at each point making a statistical guess of “what one might expect someone to write after seeing what people have written on billions of webpages, etc”. That’s it!
Ever since ChatGPT arrived last November, people have been astonished by the capabilities of these parrots – how humanlike they seem to be and so on. But consolation was drawn initially from the thought that since the models were drawing only on what already resided in their capacious memories, then they couldn’t be genuinely original: they would just regurgitate the conventional wisdom embedded in their training data. That comforting thought didn’t last long, though, as experimenters kept finding startling and unpredictable behaviours of LLMs – facets now labelled “emergent abilities”.
From the beginning, many people have used LLMs as aids to brainstorming. Ask one of them for five ways to reduce your household’s carbon footprint and it’ll come up with a list of reasonable and actionable suggestions. So it’s clear that the combination of human plus LLM can be a creative partnership. But of course what we’d really like to know is whether the machines on their own are capable of creativity?
Ah, but isn’t creativity a slippery concept – something that’s hard to define but that we nevertheless recognise when we see it? That hasn’t stopped psychologists from trying to measure it, though, via tools such as the alternative uses test and the similar Torrance test. And it turns out that one LLM – GPT-4 – beats 91% of humans on the former and 99% of them on the latter. So as the inveterate artificial intelligence user Ethan Mollick puts it: “We are running out of creativity tests that AIs cannot ace.”
Mollick works in a business school (Wharton, based at the University of Pennsylvania) and has been a cheerleader for LLMs from the beginning. Some of his colleagues conducted an experiment with GPT-4 and 200 of their students, setting humans and machine the same challenge: come up with an idea for a product aimed at American college students that would retail for less than $50.
And the results? “ChatGPT-4 generated more, cheaper and better ideas than the students. Even more impressive, from a business perspective, was that the purchase intent from outside judges was higher for the AI-generated ideas as well! Of the 40 best ideas rated by the judges, 35 came from ChatGPT.”
The really illuminating aspect of the study, though, was an inference drawn from it by the researchers about the economics of it. “A professional working with ChatGPT-4,” they write, “can generate ideas at a rate of about 800 ideas per hour. At a cost of $500 per hour of human effort, a figure representing an estimate of the fully loaded cost of a skilled professional, ideas are generated at a cost of about $0.63 each… At the time we used ChatGPT-4, the API fee [application programming interface, which allows two or more computer programs to communicate with each other] for 800 ideas was about $20. For that same $500 per hour, a human working alone, without assistance from an LLM, only generates 20 ideas at a cost of roughly $25 each… For the focused idea generation task itself, a human using ChatGPT-4 is thus about 40 times more productive than a human working alone.”
If you wanted an insight about how corporations will view this technology, then you couldn’t do better than this. Reading it brought to mind Ted Chiang’s perceptive New Yorker essay about how AI would in fact be used. “I suggest,” he wrote, “that we think about AI as a management consulting firm, along the lines of McKinsey & Company. Firms like McKinsey are hired for a wide variety of reasons, and AI systems are used for many reasons, too. But the similarities between McKinsey – a consulting firm that works with 90% of the Fortune 100 – and AI are also clear.”
Chiang quotes a former McKinsey employee’s description of the consultancy as “capital’s willing executioners”. If you’re a senior executive who has to take some unpalatable decisions but needs plausible deniability, being able to cite an external consultant – or a new technology? – is a good way to do it. So, says Chiang, as AI becomes more powerful and flexible, the question we should be asking is: is there any way to keep it from being another version of McKinsey? You only have to ask the question to know the answer.
What I’ve been reading
Deutsche courage
Just for Fun is a lovely essay by Rebecca Baumgartner on the 3 Quarks Daily platform about people’s reaction to the news that she’s learning German – for fun!
Hobbes nobbing
AI and Leviathan: Part II is No 2 in a remarkable series of essays by Samuel Hammond on his Second Best blog.
Man of many words
Henry Oliver’s essay on Substack’s Common Reader blog – Samuel Johnson, Opsimath – is a nice tribute to the Great Cham.