In 2017, researchers at the British AI company DeepMind (now Google DeepMind) published an extraordinary paper describing how their new algorithm, AlphaZero, had taught itself to play a number of games to superhuman standards without any instruction. The machine could, they wrote, “achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.”
Speaking afterwards at a big machine-learning conference, DeepMind’s chief executive, Demis Hassabis (himself a world-class chess player), observed that the program often made moves that would seem unthinkable to a human chess player. “It doesn’t play like a human,” he said, “and it doesn’t play like a program. It plays in a third, almost alien, way.” It would be an overstatement to say that AlphaZero’s capabilities spooked those who built it, but it clearly surprised some of them. It was, one (privately) noted later, a bit like putting your baby daughter to sleep one evening and finding her solving equations in the morning.
That was six years ago. Spool forward to now, when a friend of mine is experimenting with GPT-4, OpenAI’s most powerful large multimodal model (accepting image and text inputs, outputting text) – the version to which you can get access for $20 (about £16) a month. He asks the model to consider a scenario in which Bob enters a room and finds a parrot in a cage that he completely covers with an opaque cloth and then leaves the room. Then Alice enters the room and places the cage, still covered with the cloth, into a cupboard that she closes. She then leaves the room. “Where,” my friend asks the model, “does each of the actors in this story believe the parrot to be?”
GPT-4 responds that, “based on the information provided”, Bob would believe the parrot to be in the covered cage somewhere in the room. “Since he left before Alice came and moved the cage into the cupboard, he has no knowledge of that act.” Alice, for her part, “would believe the parrot to be in the covered cage inside the cupboard”.
So far, so good. But the bit I really like comes next. “The parrot,” says GPT-4, “unless it has a very advanced understanding of human actions, will likely be aware only of its immediate surroundings and not have a clear concept of ‘being in the cupboard’. It would sense that it’s in a dark and confined space (the covered cage) but, beyond that, it’s hard to say what a parrot’s perception or understanding would be.”
The program also adds a rider that “this is a simplified explanation and the actual beliefs of Bob and Alice could be influenced by additional factors not mentioned in the scenario, such as prior knowledge, assumptions, or other contextual clues”.
Now, I know what you’re thinking. The puzzle posed by my friend wasn’t a particularly challenging one. A five-year-old human could probably solve it – though perhaps without formulating the concluding caveat. Even so, my friend’s interaction with the machine neatly undermines one of the critical assumptions many of us made when these large language models first broke cover – that they would not be capable of reasoning. After all, we argued, they are just “stochastic parrots” – machines that make statistical guesses about the next most likely word in a sentence based on the vast database of sentences they have ingested during training. But if GPT-4 is indeed such a parrot, then it’s a bird that can do at least some reasoning.
Unsurprisingly, then, researchers have been scrambling to find out how good GPT-4 and its peers are at logic by testing them on classical tests for reasoning ability. The most recent study I’ve seen concludes that GPT-4 performs “relatively well” on established tests but finds certain kinds of tasks “challenging”. “Room for improvement” might be the verdict, just at the moment. But given the frenzied pace of development in this technology, it will get better with time.
At the back of all this, of course, is the $64tn question: are these models a stepping stone to AGI (artificial general intelligence) – “superintelligent” machines? Conventional wisdom says no, because while they may be smart, they don’t have any knowledge of the world in all its complexity. But what does seem beyond doubt is that they are increasingly capable. “GPT-4,” concludes a recent Microsoft study, for example, “can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4’s performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT.” We need to watch this space.
What I’ve been reading
Media evolutions
In an interesting essay in Noema, The New Media Goliaths, the formidable Renée DiResta outlines how our media ecosystem has radically changed.
Boris judgment
Legal scholar Mark Elliott asks in a splendid blogpost, Was Boris Johnson undemocratically removed from parliament?. Definitely not, he concludes.
AI rabbit hole
In her terrific essay Talking About a ‘Schism’ Is Ahistorical, Emily Bender asserts that the discourse about the “existential risk” posed by AI avoids the really important questions.