On a free evening last October, Mahtab Sahni took up an old pastime. He started browsing the website erdosproblems.comAn updated record of the 1,179 conjectures left by the prodigious and indefatigable 20th century mathematician Paul Erdős.
Sawhney, a mathematician at Columbia University, had always been interested in Erdős problems, which ranged from small curiosities to central open problems in number theory and combinatorics.
He encountered one problem, #339, which seemed too straightforward to be “open” even nearly two decades after Erdös’s death. He had seen similar projections before. “There were many such problems Very Accessible,” Sawhney says. In the past, he’s turned to Google. “And then eventually, with enough searching, I’ll find a reference to the solution.”
On supporting science journalism
If you enjoyed this article, consider supporting our award-winning journalism Subscribing By purchasing a subscription, you are helping ensure a future of impactful stories about the discoveries and ideas shaping our world today.
But recently he’s been playing with ChatGPT as a new way of checking the literature. “I decided to plug it in and then it told me there was a reference in it,” says Sawhney.
It went so well that he reached out to his fellow mathematician, Mark Selke, who had recently gone on leave from an academic post to work for OpenAI. Together they led ChatGPT to find lost solutions to nine more Erdöd problems, as well as partial solutions to 11 others.
Since then, activity on the website has skyrocketed. According to a webpage started by mathematician Terence Tao, AI tools have helped move nearly 100 of Erdő’s problems into the “solution” column From October. The bulk of this support has been finding a kind of souped-up literature, as was the case with Sawhney’s initial success. But in many cases, LLMs have pieced together existing theorems – often in conversation with their mathematician inspirers – to create new or better solutions to these specific problems. In at least two cases, an LLM was even able to produce an original and valid proof for a case that was never solved, that too with very little input from a human.
The story of Erdő’s problems is part of a larger change that has taken place over the past few months. LLMs have become unmatched in their ability to examine and synthesize the literature on any mathematical topic, no matter how esoteric. They can also guide working mathematicians, helping them design a path to prove a larger result and smaller parts of it to save time. This aid is often misguided and contains holes that require expert eyes to sort out. But mathematicians can see its potential.
“They are now useful research aids,” says Andrew Sutherland, a mathematician at the Massachusetts Institute of Technology. “Mathematicians whose only experience with LLM is with previous models do not yet fully appreciate it.”
AI is not yet capable of solving major open problems in mathematics, let alone replacing mathematicians. Despite widespread concerns expressed by graduate students during conference coffee breaks and in online message boards, no major mathematics journal has published a peer-reviewed evidence citing the use of the LLM. But, at least, that may change this year.
assessing the state of things
Erd’s problems are a useful LLM “benchmark” because there are so many of them. And they have proven to be a typical demonstration of the growing power of the technology as a mathematical search engine.
“Erdő’s problems fit into a category of their own,” says Sutherland. “For the most part, they are individual problems whose solution will have no widespread impact.” As a result, solving the more obscure Erdöd problem is an achievement that often goes unnoticed. It is rarely worth presenting in a magazine and is rarely mentioned in later works.
None of this matters for LLM. It can easily detect pre-print papers unknown even to experts – evidence that sometimes does not reference Erdő at all. Google’s Gemini found an absurd comment in a 1981 paper that inadvertently solved Erdös’ problem #1089. But what is more surprising is LLM’s ability to provide meaningful mathematical suggestions.
“I think it’s a mistake to say it’s ‘just a search engine,'” says Sutherland. “I’ve had one or two conversations where it actually pointed me toward an outcome that gave me a chance to prove something I was stuck on.”
Similar experiences inspired the team behind First Proof for a new effort to test AI’s math skills. Eleven top mathematicians chose different pieces of proofs they have completed but not yet published and presented them as a challenge to AI last Thursday. Problems cover wide areas and vary in complexity. “A system that could solve all of these would be very useful to a professional mathematician,” says Daniel Litt, a mathematician at the University of Toronto.
The team is giving time till Friday to LLMs to submit evidence of 10 problems. According to Harvard University mathematician Lauren Williams of the first proof team, the one-week time frame was carefully chosen. It took him and his coauthors less time to prove than their own problem, so that’s probably not enough time for human mathematicians without AI help.
By Monday, Williams and his colleagues’ e-mails and social media pages were filled with claimed solutions. “There’s a lot of excitement out there, which is really great to see,” she says. A Discord server hosting discussions on the challenge has quickly garnered hundreds of members, many of whom have purported testimonials from ChatGPT and other LLMs.
Familiar problems have already arisen. First proof meant more than a literature search – the team tested their questions on LLM to make sure no answers existed in their training data. But an online solution to the problem quickly emerged from Martin Hairer, winner of the 2014 Fields Medal—mathematics’s highest honor—and one of the members of the first proof team. When he raised the issue, he had overlooked a partial proof in the core of his personal website that had been archived by the Wayback Machine.
And competitors lacking team expertise in these specific mathematical areas are not sure what to make of the flood of confident claims their LLMs are repeatedly putting out – it is up to the First Proof team to check every submission. “Verification is a problem because 90 percent of the time it will bring resolution,” Williams says. “This is something to write home about and seems confident about.”
Litt has taken a look at many of the “evidence” circulated this week and found that they are largely bogus – although he has also seen some that may be true. “It’s absolutely impressive that models are sometimes able to give correct answers to some problems,” he says. “But they are generating huge amounts of waste.” Even till Saturday it will not be clear whether LLM has won or lost.
an important year
Despite the first proof results, the past month has brought several indications that LLMs will soon be part of many mathematicians’ tool chests.
In January, Ravi Vakil, current president of the American Mathematical Society, posted a preprint along with two other mathematicians and two researchers from Google. in which they collaborated to solve a math problem Which is based on his research. The authors document how Google’s LLM helped them obtain the proof. “It really led us to new ideas,” says Vakil, who wanted to figure out how mathematicians should be doing mathematics appropriately in five years.
Still, LLM has not yet provided any evidence that would warrant discussion if it came from a human. “Each individual result has been highly publicized in certain parts of the Internet,” Litt says. Carlo Pagano, who collaborated with Google’s DeepMind team Work on many of Erdő’s problems using Gemini In the research posted as a preprint, a more significant benchmark is also expected. “Erdös’ problems are not big in some ways,” he says. “It’s important to do this even on problems that we know are of broad interest.”
But many mathematicians have predicted that 2026 will be the year where these types of results, in which AI is a declared contributor, will come through peer review in major mathematics journals for the first time.
“I think it will change the topic,” Sahni says. “And that’s a really exciting thing.” Given that change, Sawhney has taken an academic leave from Columbia to work for OpenAI. This week Pagano started a joint position at Google DeepMind. “It’s clear that this will change the way we do math,” he says, “so it’s better to start early rather than late.”
