Even though computers were made to do maths faster than any human could manage, the top level of formal mathematics remains an exclusively human domain. But a breakthrough by researchers at Google DeepMind has brought AI systems closer than ever to beating the best human mathematicians at their own game.
A pair of new systems, called AlphaProof and AlphaGeometry 2, worked together to tackle questions from the International Mathematical Olympiad, a global maths competition for secondary-school students that has been running since 1959. The Olympiad takes the form of six mind-bogglingly hard questions each year, covering fields including algebra, geometry and number theory. Winning a gold medal places you among the best handful of young mathematicians in the world.
The combined efforts of DeepMind’s two systems weren’t quite in that league. After their answers were marked by Prof Timothy Gowers – a winner of the mathematics equivalent to the Nobel prize, the Fields medal, and a gold medalist in the olympiad himself – the DeepMind team scored 28 out of 42 – enough for a silver medal, but one point short of gold.
Unlike a human mathematician, the systems were either flawless or hopeless. In each of the questions they solved, they scored perfect marks, but for two out of the six questions, they were unable to even begin working towards an answer. Moreover, DeepMind, unlike humancompetitors, was given no time limit. While students get nine hours to tackle the problems, the DeepMind systems took three days working round the clock to solve one question, despite blitzing another in seconds.
The two systems that worked on the challenge were very different from each other. AlphaProof, which solved three of the problems, works by pairing a large language model – of the sort applied in consumer chatbots – with a specialist “reinforcement learning” approach, like that used by DeepMind to tackle the board game Go. The trick is in leveraging a pre-existing approach called “formal mathematics”, a set of rules that lets you write a mathematical proof as a program that can run only if it is true.
“What we try to do is to build a bridge between these two spheres,” said Thomas Hubert, the lead on AlphaProof, “so that we can take advantage of the guarantees that come with formal mathematics and the data that is available in informal mathematics.” After it was trained on a vast number of maths problems written in English, AlphaProof used its knowledge to try to generate specific proofs in the formal language. Because those proofs can be verifiably true or not, it is possible to teach the system to improve itself. The approach can solve difficult problems, but isn’t always fast at doing so: while it is far better than simple trial and error, it took three days to find the correct formal model for one of the hardest questions in the challenge.
The other system, AlphaGeometry 2, similarly pairs a language model with a more mathematically inclined approach. But its success at the narrower field of geometry problems was startling: it solved its problem in just 16 seconds. And, Gowers says, chose a surprising route to do so. “There have been some legendary examples of [computer-aided] proofs that are longer than Wikipedia. This was not that: we’re talking about a very short, human-style output.”
The lead on AlphaGeometry 2, Thang Luong, described the output as similar to the famous “move 37” in DeepMind’s historic victory at Go, when the AI system made a move no human would have thought of, and went on to win. AlphaGeometry 2’s proof involved constructing a circle around another point, and then using that circle to prove the overall answer. “At first, our expert didn’t quite understand why it constructed that point at all,” said Luong. “But after looking at the solution, it really connects many triangles together, and they thought that the solution was really quite elegant.”
AlphaGeometry 2’s easiest question …
Let ABC be a triangle with AB < AC < BC. Let the incentre and incircle of triangle ABC be I and ω, respectively. Let X be the point on line BC different from C such that the line through X parallel to AC is tangent to ω. Similarly, let Y be the point on line BC different from B such that the line through Y parallel to AB is tangent to ω. Let AI intersect the circumcircle of triangle ABC again at P ≠ A. Let K and L be the midpoints of AC and AB, respectively.
Prove that ∠KIL + ∠YPX = 180◦.
Solved in 19 seconds.
… and AlphaProof’s hardest one
Turbo the snail plays a game on a board with 2024 rows and 2023 columns. There are hidden monsters in 2022 of the cells. Initially, Turbo does not know where any of the monsters are, but he knows that there is exactly one monster in each row except the first row and the last row, and that each column contains at most one monster.
Turbo makes a series of attempts to go from the first row to the last row. On each attempt, he chooses to start on any cell in the first row, then repeatedly moves to an adjacent cell sharing a common side. (He is allowed to return to a previously visited cell.) If he reaches a cell with a monster, his attempt ends and he is transported back to the first row to start a new attempt. The monsters do not move, and Turbo remembers whether or not each cell he has visited contains a monster. If he reaches any cell in the last row, his attempt ends and the game is over.
Determine the minimum value of n for which Turbo has a strategy that guarantees reaching the last row on the nth attempt or earlier, regardless of the locations of the monsters.
Unsolved.