The Turing test for measuring A.I. intelligence is…

The Turing test for measuring A.I. intelligence is outdated because of ChatGPT’s wizardry, and a new test would be better, DeepMind cofounder says

(Credit: Marlene Awaad—Bloomberg/Getty Images)

Alan Turing, the trailblazing mathematician played by Benedict Cumberbatch in The Imitation Game (2014), is considered the “father of artificial intelligence.” One of his legacies is the Turing test, which measures whether a machine can think like a human, and has long been a North Star in the artificial intelligence field. For the Turing test, a person and a machine have a text conversation, and another human tries to determine which participant is the human. The test, created in 1950, was originally named the “imitation game” because it gauges how intelligent a machine is based on how indistinguishable it is from a human.

An A.I. program called Eugene Goostman first passed the Turing test in 2014, and newer chatbots such as ChatGPT and Google’s LaMDA have passed it more recently. Considering the increasing sophistication of generative A.I. chatbots, the test is outmoded, and machine intelligence demands a new lodestar, according to Mustafa Suleyman, cofounder of the A.I. lab DeepMind. Suleyman, who sold DeepMind to Google for $650 million in 2014, is among the few A.I. founders to have already made his millions in the field.

“Large language model” chatbots like ChatGPT, Microsoft’s Bing, and Google’s Bard can communicate fluidly in text conversations after being trained on human-produced online content. Many chatbots use natural language processing—the interdisciplinary use of computing and linguistics—to understand large amounts of ordinary language much like a human would, taking context into account.

“It’s totally unclear whether [the Turing test] is a meaningful milestone or not,” Suleyman told Bloomberg. “It doesn’t tell us anything about what the system can do or understand, anything about whether it has established complex inner monologues, or can engage in planning over abstract time horizons, which is key to human intelligence.”

The Turing test’s utility is fading because chatbots can now emulate human writing without having any true level of understanding, by simply generating the most likely series of words. In Suleyman’s upcoming book, The Coming Wave: Technology, Power, and the Twenty-first Century’s Greatest Dilemma, he argues for a new gold standard for measuring A.I. sophistication.

Suleyman’s new proposed gauge, which he referred to as the “modern Turing test,” is based on entrepreneurship. The test asks A.I. to turn $100,000 in seed money into $1 million by devising an original product idea, creating an e-commerce business plan, finding manufacturers, and listing the item. This test would allow evaluators to measure A.I.’s ability to set goals, plan, and execute complex tasks.

While the original Turing test lets chatbots skate by with high-level imitation, Suleyman’s model would require machines to exhibit the advanced strategy and reasoning skills that make human intelligence unique. The Turing test’s goal is to see whether a machine is at least as intelligent as a human, but to determine this, a machine must be tested outside just its conversational abilities.

“We don’t just care about what a machine can say; we also care about what it can do,” Suleyman writes in The Coming Wave.

It took a machine 64 years to pass the Turing test, but Suleyman thinks A.I. will pass his proposed update to the Turing test within the next two years.

If that happens, “the consequences for the world economy are seismic,” he told Bloomberg.

Suleyman and Crown Publishing, the publisher of his book, were not immediately available for comment.

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here