

Are the days of handwritten mathematics coming to an end?
Laborant / Alamy
In March 2025, mathematician Daniel Litt made a bet. Despite the march of progress of artificial intelligence in many fields, he believed his subject was safe, wagering with a colleague that there was only a 25 per cent chance an AI could write a mathematical paper at the level of the best human mathematicians by 2030. Only a year later, he thinks he was wrong. “I now expect to lose this bet,” he declared on his blog.
Mathematicians have been taken aback by the speed of improvements in AI’s ability to solve problems and produce proofs. “A couple of years ago, they were basically useless for even solving high school math problems, and now they can sometimes solve problems that really appear in the research life of a mathematician,” says Litt, who is at the University of Toronto.
This progress is faster than many had predicted, with mathematicians warning that their profession is undergoing one of the fastest evolutions the field has ever seen. “We are running out of places to hide,” wrote Jeremy Avigad at Carnegie Mellon University in Pennsylvania in a recent essay. “We have to face up to the fact that AI will soon be able to prove theorems better than we can.”
It isn’t a single event that is causing reactions like this, but the steady mathematical progress AIs are demonstrating. Last year, companies like OpenAI and Google DeepMind achieved gold-medal performances on the International Mathematical Olympiad, an elite competition for high-school students that many experts had previously written off as being beyond the purview of AI tools. In January, mathematicians began using similar tools to solve long-standing problems posed by the Hungarian mathematician Paul Erdős.
Now, in two separate developments, AI has begun to take on more complex mathematics, solving real research problems and helping to automatically verify cutting-edge proofs, which traditionally could take a huge amount of work from teams of mathematicians.
In February, Nikhil Srivastava at the University of California, Berkeley, and his colleagues started the First Proof project in an attempt to find a more realistic benchmark with which to test AI’s mathematical aptitude. The first round of the project consisted of 10 problems that researchers had needed to solve as part of their work, from wildly different mathematical fields.
“They were naturally occurring problems in our day-to-day research,” says Srivastava. “They’re drawn from a typical distribution of difficulty. They weren’t super hard, but they weren’t routine either. There was really a range.”
Proof of progress
After the problems were made public, solutions began to pour in. Researchers at tech companies, including OpenAI and Google DeepMind, were among those who attempted to solve the First Proof problems with their own AI models. OpenAI claims it answered half of the problems correctly, according to “feedback from experts”, while Google DeepMind scored 6 out of 10, according to mathematicians it consulted for each problem.
“Things have changed so fast,” says Thang Luong at Google DeepMind. “For us, now AI has really become a serious collaborator, either to produce serious research work or, in the case of First Proof, it can also actually propose a solution by itself.”
Google’s AI maths tool, called Aletheia, uses a computationally intensive version of its Gemini AI chatbot, paired with a verification algorithm to look for flaws in possible solutions. It can then iteratively produce improvements by itself until it arrives at an answer. Google hasn’t publicly disclosed how many iterations Aletheia needed to solve those problems, which makes it difficult to assess just how good it is, but mathematicians are still impressed.
Not all the problems were unanimously agreed as being solved. With problem 8, for example, which is in a niche area of geometry, only five of the seven experts that Google asked agreed that the proposed solution was correct. Ivan Smith at the University of Cambridge, who wasn’t involved in the Google effort, says the AI does appear to be taking a sensible approach to this problem and shows good progress. “If this was a PhD student coming back with their thoughts, it would be encouraging and would build confidence that the result was actually true,” says Smith.
This highlights a problem with AI-generated proofs – checking them is hard work. It would be easy to get into a situation where AI can generate proofs faster than humans can check them. If a theorem is proved by an AI, but nobody is around to check it, has it been proved? AI can help here, too.
The technology is quickly improving at translating handwritten proofs in natural language, like the problems posed in First Proof, into a format that can be checked by a computer, a process called formalisation.
The AI company Math, Inc. recently took mathematicians by surprise by announcing that its AI tool, called Gauss, had formalised an award-winning proof and verified it was correct. The proof concerned how many spheres can be packed into a space and was the subject of Maryna Viazovska’s 2022 Fields medal, often called the Nobel prize of mathematics.
The effort to formalise Viazovska’s work began with a small group of mathematicians at the end of 2024, working separately from Math. Inc, who hoped to manually translate the problem into computer code. They first looked at Viazovska’s sphere packing-solution in eight dimensions. While they were making steady progress, Math, Inc., which had later provided assistance to the researchers, announced it already had a full proof, and then, a more general version of a result for 24 dimensions.
Bhavik Mehta at Imperial College London and his colleagues had originally sketched out a rough plan of how to formalise the work, as well as come up with important mathematical definitions. Without this, the AI couldn’t have completed its proofs, says Mehta.
“We had made all the pieces, but we hadn’t written the instruction manual that explains how to put them together,” says Chris Birkbeck at the University of East Anglia, UK, also part of the team.
A new style of mathematician
The final proof was around 200,000 lines of code, which constitutes about 10 per cent of all existing formalised mathematics. Although it is likely that this code is about 10 times longer than a human would have produced to do the same task, it is still a huge achievement, says Johan Commelin at Utrecht University in the Netherlands. “This is a big deal. This is Fields medal-winning work, and it’s being auto-formalised.”
Similar efforts should now be possible for a large number of other fields, says Commelin, which could transform how mathematics is practised. “The future that we’re all thinking of is that we’ll have tooling that will automatically formalise new research and mathematical papers, and flag whether there are mistakes or not,” says Commelin. “This will have huge implications for, say, peer review and refereeing work.”
Faced with a future in which an increasing share of mathematics is done by AI, some mathematicians, like Avigad, are raising the alarm about the detrimental effects this might have on our ability to practise and come up with new mathematics.
Using machines to solve the types of problem posed in First Proof may produce concrete proofs, says Anna Marie Bohmann at Vanderbilt University in Tennessee, but we lose the “learning opportunity”, she says. “Struggling to create and formulate new ideas and to solve new problems is one of the main ways in which both students and mathematics professionals consolidate their knowledge.”
Tony Feng, one of the Aletheia team at Google DeepMind, feels similarly and is cautious about using the tool himself. “A lot of times I feel like I should be doing my own homework and going through the process of building my own intuition.”
Even formalising proofs can generate important insights, says Mehta, and he and his colleagues will now need to untangle the 200,000-line AI proof to work out what might be useful for other projects.
But mathematicians are still hopeful there will be a place for them in an increasingly machine-led future. Looking to history, Commelin says that manual calculations were once a large part of being a mathematician, but they are now done automatically. “I think similar things will happen here, where we radically change what we’re doing, but 10 or 20 years from now, we will still recognise what we’re doing as mathematics, in a new style.”
Topics:
- artificial intelligence/
- mathematics


