Latest research from OpenAI And the partner institute reveals something unexpected about GPT-5: It’s starting to work less like a sophisticated search engine and more like a talented assistant who helps you tackle tough problems.
In a wide-ranging study across mathematics, physics, biology, and computer science, researchers documented cases where AI models just didn’t capture the information. It generated novel evidence, identified hidden connections between disparate fields, and compressed months of theoretical work into hours.
Here’s what caught my attention: GPT-5 solved four previously unsolved mathematical problems. Not a projected, not a suggested approach; Actually solved them.
One of these, Erdöd’s problem #848, has puzzled mathematicians for decades. Contribution of AI? A consistency-style analysis that human mathematicians had ignored was trapped between layers of human insight.
But let’s take a break for a second. This is not about AI replacing scientists. Fields Medalist Timothy Gowers, who was also involved in the study, put it perfectly when he compared GPT-5’s contribution to the contribution of a knowledgeable research supervisor: helpful, sometimes insightful, but not yet at the level where you’d list them as a co-author on most papers.
The real magic happens in something called “compression factor.” Brian Spears of Lawrence Livermore National Laboratory used GPT-5 to model thermonuclear burn propagation in fusion experiments.
Six hours of collaborative work with the AI accomplished what they estimated would take six person-months with a team of postdocs. It’s not just efficiency; This is a fundamental change in how research can be done.
Is ChatGPT the new Google?
Can ChatGPT revolutionize or replace search engines? Can ChatGPT or any other AGI tool be that tiny Google?
literary discovery revolution
Perhaps the most immediate practical application comes from GPT-5’s ability to perform what researchers call “deep literature search.” This goes beyond keyword matching.
The model recognized that a new result in density estimation was mathematically equivalent to operating on an “approximate Pareto set” in multi-objective optimization; A connection that human writers had completely forgotten because the fields use completely different terminology.
In another notable example, GPT-5 found solutions to 10 Erdöd problems that had previously been marked “open”, including German-language papers from decades earlier. The model also found a solution hidden in a brief side note between two theorems in the 1961 paper, which had been ignored by human reviewers for more than 60 years.
Where human expertise remains essential
The research also highlights important limitations. Derya Unutmaz’s immunology experiments show both promise and danger.
GPT-5 correctly identified that 2-deoxy-D-glucose was interfering with N-linked glycosylation rather than simply glycolysis in T cells, a mechanistic insight that the research team had missed despite deep expertise in the field. Yet the model also required constant human oversight to catch overconfident claims and flawed reasoning.
Christian Koester’s work on online algorithms demonstrates another pattern: GPT-5 excels in specific, well-defined sub-problems, but struggles with open theoretical questions.
When asked to prove or disprove that a particular algorithm can achieve a certain performance limit, he devised an elegant counter-example using the Chevalley–Warning theorem. But when more general results are emphasized, flawed arguments often arise that require human correction.
loft effect
A striking pattern emerged across all subjects: GPT-5 performed dramatically better when appropriately “scaffolded.” Alex Lupuska discovered it when the model initially failed to find symmetry in the black hole equations.
But after previously working on a simple flat-space problem, GPT-5 successfully achieved complex curved-space symmetry, reproducing months of human work in minutes.
This scaffolding requirement outlines some basics about current AI capabilities. These models possess vast knowledge and computational power, but they require human expertise to direct that capacity effectively.
It’s like having access to a Formula 1 engine; Extremely powerful, but you still need to know how to build the rest of the car and drive it.
How to Use GPT-4o Mini to Build AI Applications (10 Tips)
In a big step towards making artificial intelligence more accessible, OpenAI has unveiled the GPT-4o Mini, its “most affordable and intelligent miniaturized model” yet.

a cautionary tale
Not all research stories are triumphant. Venkatesan Guruswamy and Parikshit Gopalan’s experience with the “clique-avoiding code” serves as an important warning.
GPT-5 provided perfect proof of the problem they had been curious about for years. Excitement turned to embarrassment when they discovered that the exact same evidence had been published three years earlier.
The AI essentially committed plagiarism without realizing it, which highlights a significant challenge for AI-assisted research: ensuring proper attribution when the model cannot always identify its sources.
What does this mean for AI professionals
For those of us working in AI, these findings suggest that we are at an inflection point. GPT-5 isn’t just an improved GPT-4; This represents a qualitative change in capacity. But perhaps more importantly, it shows that the way forward is not about replacing human intelligence but about creating new forms of human-AI collaboration.
Researchers have repeatedly stressed that using GPT-5 effectively requires deep domain expertise. You need to know when the model is hallucinating, when to back off from your claims, and how to resolve problems appropriately. In short, the better you are in your field, the more value you can get from these AI partners.
As we move forward, the question is how we will adapt our workflows, our attribution systems, and our understanding of creativity to accommodate these new collaborators.
If these early experiments are any indication, the future of science will look less like humans versus machines and more like the best of both working together to push the boundaries of knowledge.