AI research

AI did real maths — here's what that does and doesn't mean

Genuinely impressive. Also genuinely narrow. Both at once.

The InsidersFeed Desk22 May 2026Verified May 2026

The answer

AI produced original, checkable maths in 2026 — real, but narrow, and not AGI.

Start with what happened. AI from OpenAI and Google DeepMind produced original results on Erdős problems — open questions that are easy to state and brutal to solve, with no memorised answer to crib. No lookup table exists for these, because no human had solved them first. When you crack one, it means actual reasoning. The 'just autocomplete' dismissal dies on contact with the Erdős testbed.

Why the singularity crowd are also wrong

The strongest results lean on Lean — a proof checker that verifies every logical step against mathematical axioms. This is exactly why they're trustworthy. It's also a tell about the shape of the achievement. Lean-verified results come from a generate-then-verify loop: the AI proposes a step; the checker either accepts it or throws an error and feeds it back. The model writes in Lean's formal language; a flawed step is simply rejected. The AI isn't free-soloing genius; it's operating under strict mechanical constraints. That's the innovation — and the limitation, both at once.

Put the two May 2026 results next to each other and the 'who won' framing collapses:

	OpenAI	DeepMind
Claim	One 1946 conjecture disproved	Nine Erdős problems + 44 conjectures solved
Verified by	Humans (incl. Timothy Gowers)	Lean, step by step, automatically
Status	Peer review pending	Each proof formally certified
The asterisk	Credible, not yet bulletproof	Brilliant — within a checkable box

The 'nine to one' scoreboard that went viral is a category error: a machine-certified proof is a different kind of object from a human-checked, peer-review-pending construction. They're not playing the same game, so counting across them is noise.

The generate-then-verify loop is the real story

Here's the part that actually matters long-term: the loop, not the result. A reliable pipeline where AI proposes and a formal system certifies works because, in mathematics, 'correct' is precisely and mechanically checkable. Lean can accept or reject a proof step with no ambiguity. Any flaw is rejected; the model gets the error back and tries again. That tight feedback cycle is what makes the results hold. Most domains don't offer that — strategy, policy, design, the messy middle of science all lack a Lean equivalent. So the achievement is sharply bounded: extraordinary inside a mechanically checkable box, untested and probably far weaker outside it. That narrowness isn't a gotcha; it's the structural cost of the trustworthiness. The certificate comes from the constraint. You can't have both the freedom and the proof.

Hassabis moved quickly to temper expectations, saying the system is 'still not AGI' even as it points toward a more practical role for AI in verified mathematical research.

Source: Crypto Briefing · 26 May 2026

Trust the vendor, not the timeline

Hassabis said the system is 'still not AGI'. That's the vendor actively capping the hype on his own headline result. When the person who built it is more cautious than the people tweeting about it, you update toward the builder. AI is now a real instrument for parts of maths — narrow parts, with a formal checker holding the pen — and pretending otherwise just sets up the next disappointment cycle.

OpenAI described the result as the first time a prominent open problem, central to a subfield of mathematics, has been solved autonomously by AI.

Source: OpenAI · 20 May 2026

So file it correctly: a real research milestone, a narrow one, and a preview of how AI gets useful in serious domains — not by being trusted, but by being checked. The generate-then-verify template is the interesting part. If it spreads to other domains where claims can be formalised, that pipeline — not any individual proof — is the result worth screenshotting. The headline 'AI solves famous maths' is true. The implied 'AI is now a mathematician' is not. The machinery in between is the only part worth your attention.

Frequently asked questions

Was the AI maths breakthrough overhyped?

Both over- and under-hyped, depending on which take you encountered. It's real — AI produced new results on long-open problems, with DeepMind's verified in Lean. But it's narrow and checker-assisted, and DeepMind's CEO said 'not AGI'. The accurate read is 'genuine milestone, narrow domain' — not 'AI replaced mathematicians'.

Why does the proof checker matter so much?

Because it's what makes results trustworthy. AI can produce convincing nonsense; Lean accepts a proof only if every step is logically valid against axioms. That 'AI proposes, checker certifies' loop is arguably the real innovation — and it could, in principle, extend to other domains where correctness is mechanically checkable.

If AI needed a proof checker, did it actually 'do' the maths?

Yes — the checker can't invent a proof, only accept or reject one. The AI writes the proof steps in Lean's formal language; the checker is an examiner, not a ghostwriter. The reasoning is the model's; the certificate is the checker's. Both are needed.

What's the actual limit on this technology?

The limit is formalisation. The generate-then-verify loop requires a domain where 'correct' is precisely and mechanically checkable. Pure mathematics is unusually well-suited. Most of science, strategy, and human decision-making is not — there's no Lean for ethics or taste or policy.

Sources

An OpenAI model has disproved a central conjecture in discrete geometry — OpenAI, 20 May 2026
Solving open problems with AlphaProof Nexus (preprint, arXiv:2605.22763) — Google DeepMind / arXiv, 21 May 2026
OpenAI's milestone math breakthrough played to AI's strengths — Understanding AI, 22 May 2026
Google DeepMind's AlphaProof Nexus solves 9 Erdős problems and 44 conjectures — Crypto Briefing, 26 May 2026

← All news

Why the singularity crowd are also wrong

Put the two May 2026 results next to each other and the 'who won' framing collapses:

	OpenAI	DeepMind
Claim	One 1946 conjecture disproved	Nine Erdős problems + 44 conjectures solved
Verified by	Humans (incl. Timothy Gowers)	Lean, step by step, automatically
Status	Peer review pending	Each proof formally certified
The asterisk	Credible, not yet bulletproof	Brilliant — within a checkable box

The generate-then-verify loop is the real story

Hassabis moved quickly to temper expectations, saying the system is 'still not AGI' even as it points toward a more practical role for AI in verified mathematical research.

Source: Crypto Briefing · 26 May 2026

Trust the vendor, not the timeline

OpenAI described the result as the first time a prominent open problem, central to a subfield of mathematics, has been solved autonomously by AI.

Source: OpenAI · 20 May 2026

Frequently asked questions

Was the AI maths breakthrough overhyped?

Why does the proof checker matter so much?

If AI needed a proof checker, did it actually 'do' the maths?

What's the actual limit on this technology?

AI did real maths — here's what that does and doesn't mean

Why the singularity crowd are also wrong

The generate-then-verify loop is the real story

Trust the vendor, not the timeline

Frequently asked questions

Sources

Related

AI 'solved' famous maths problems. Read the asterisks.

The White House's 'voluntary' AI rules come with a 90-minute stick

OpenAI's science week: the results are real — and so are the asterisks

AI did real maths — here's what that does and doesn't mean

Why the singularity crowd are also wrong

The generate-then-verify loop is the real story

Trust the vendor, not the timeline

Frequently asked questions

Sources

Related

AI 'solved' famous maths problems. Read the asterisks.

The White House's 'voluntary' AI rules come with a 90-minute stick

OpenAI's science week: the results are real — and so are the asterisks