Introduction

  • TL;DR: AI systems are increasingly contributing to solutions on the Erdos Problems site (1,000+ problems/conjectures).
  • A key shift is output quality: not just natural-language reasoning, but Lean-verified formal proofs in some cases (e.g., Erdos #728 via GPT-5.2 Pro + Harmonic’s Aristotle).
  • This is driven by agentic loops + evaluators/proof assistants, a pattern also seen in DeepMind’s AlphaEvolve.

What actually happened on the Erdos Problems list

TechCrunch reports that since around Christmas 2025, 15 problems moved from “open” to “solved,” and 11 explicitly credit AI involvement. The important detail is that some of these efforts end with a machine-checkable artifact (formal proof), rather than an informal explanation.

Why it matters: Formal artifacts are composable, testable, and CI-friendly—unlike prose.

Case study: Erdos Problem #728 and the “LLM + Lean” pipeline

An arXiv write-up states Erdos Problem #728 was regarded as the first open Erdos problem “fully resolved autonomously by an AI system,” combining OpenAI GPT-5.2 Pro with Harmonic’s Aristotle, producing a Lean proof. Harmonic and Reuters emphasize Aristotle’s design choice: require reasoning as Lean4 code so it can be mechanically checked.

Why it matters: Verification changes the failure mode from “sounds right” to “does it typecheck?”

AlphaEvolve shows the bigger pattern: loops + evaluators beat “one-shot answers”

DeepMind describes AlphaEvolve as an evolutionary coding agent powered by Gemini models plus automated evaluators, iterating to improve solutions. So the practical breakthrough is not “a single model got smart,” but “systems that search, test, and verify” are becoming mainstream.

Why it matters: In production settings, reliability comes from workflow design, not model branding.

Don’t overclaim: Tao’s disclaimers and OpenAI’s stance

Tao’s GitHub wiki stresses caveats: Erdos problems vary wildly; “open” status can be provisional; formal proofs can still be mis-specified or exploit library quirks. OpenAI similarly frames GPT-5.2 as accelerating exploration while keeping responsibility for correctness and interpretation with humans.

Why it matters: The honest story is “verifiable pipelines are emerging,” not “AI reached human-level math.”

Minimal Lean snippet (illustrative)

1
2
3
4
import Mathlib

theorem one_plus_one_eq_two : (1 : Nat) + 1 = 2 := by
  decide

Conclusion

  • Verified outputs (Lean proofs) are the real shift, not flashy natural-language reasoning.
  • GPT-5.2 Pro + Aristotle illustrates a workable division of labor: exploration vs. formalization/verification.
  • AlphaEvolve reinforces the winning recipe: iterative search + automated evaluators.
  • Use Tao’s disclaimers as guardrails against hype.

Summary

  • Lean-verified proofs are now appearing in AI-assisted math workflows.
  • The key enabler is verification-centric pipelines, not one-shot chat answers.
  • “Open problem solved” headlines need careful due diligence.

#ai #lean4 #formalverification #llm #math #agents #gpt52 #alphEvolve #theoremproving #reliability

References

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
AI models are starting to crack high-level math problems | TechCrunch | 2026-01-14 | https://techcrunch.com/2026/01/14/ai-models-are-starting-to-crack-high-level-math-problems/
Resolution of Erdős Problem #728: a writeup of Aristotle's Lean proof | arXiv | 2026-01-12 | https://arxiv.org/abs/2601.07421
AI contributions to Erdős problems | GitHub Wiki (teorth/erdosproblems) | 2026-01-15 | https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-problems
Introducing GPT-5.2 | OpenAI | 2025-12-11 | https://openai.com/index/introducing-gpt-5-2/
Advancing science and math with GPT-5.2 | OpenAI | 2025-12-11 | https://openai.com/index/gpt-5-2-for-science-and-math/
AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms | Google DeepMind | 2025-05-14 | https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
Mathematical exploration and discovery at scale | arXiv | 2025-11-03 | https://arxiv.org/abs/2511.02864
Robinhood CEO's math-focused AI startup Harmonic valued at $1.45 billion | Reuters | 2025-11-25 | https://www.reuters.com/business/robinhood-ceos-math-focused-ai-startup-harmonic-valued-145-billion-latest-2025-11-25/
Aristotle Lean API | Harmonic | 2026-01-?? | https://aristotle.harmonic.fun/
Erdős Problems | erdosproblems.com | 2026-01-?? | https://www.erdosproblems.com/

[5]: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ " AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms - Google DeepMind

"