Introduction
- TL;DR: AI systems are increasingly contributing to solutions on the Erdos Problems site (1,000+ problems/conjectures).
- A key shift is output quality: not just natural-language reasoning, but Lean-verified formal proofs in some cases (e.g., Erdos #728 via GPT-5.2 Pro + Harmonic’s Aristotle).
- This is driven by agentic loops + evaluators/proof assistants, a pattern also seen in DeepMind’s AlphaEvolve.
What actually happened on the Erdos Problems list
TechCrunch reports that since around Christmas 2025, 15 problems moved from “open” to “solved,” and 11 explicitly credit AI involvement. The important detail is that some of these efforts end with a machine-checkable artifact (formal proof), rather than an informal explanation.
Why it matters: Formal artifacts are composable, testable, and CI-friendly—unlike prose.
Case study: Erdos Problem #728 and the “LLM + Lean” pipeline
An arXiv write-up states Erdos Problem #728 was regarded as the first open Erdos problem “fully resolved autonomously by an AI system,” combining OpenAI GPT-5.2 Pro with Harmonic’s Aristotle, producing a Lean proof. Harmonic and Reuters emphasize Aristotle’s design choice: require reasoning as Lean4 code so it can be mechanically checked.
Why it matters: Verification changes the failure mode from “sounds right” to “does it typecheck?”
AlphaEvolve shows the bigger pattern: loops + evaluators beat “one-shot answers”
DeepMind describes AlphaEvolve as an evolutionary coding agent powered by Gemini models plus automated evaluators, iterating to improve solutions. So the practical breakthrough is not “a single model got smart,” but “systems that search, test, and verify” are becoming mainstream.
Why it matters: In production settings, reliability comes from workflow design, not model branding.
Don’t overclaim: Tao’s disclaimers and OpenAI’s stance
Tao’s GitHub wiki stresses caveats: Erdos problems vary wildly; “open” status can be provisional; formal proofs can still be mis-specified or exploit library quirks. OpenAI similarly frames GPT-5.2 as accelerating exploration while keeping responsibility for correctness and interpretation with humans.
Why it matters: The honest story is “verifiable pipelines are emerging,” not “AI reached human-level math.”
Minimal Lean snippet (illustrative)
| |
Conclusion
- Verified outputs (Lean proofs) are the real shift, not flashy natural-language reasoning.
- GPT-5.2 Pro + Aristotle illustrates a workable division of labor: exploration vs. formalization/verification.
- AlphaEvolve reinforces the winning recipe: iterative search + automated evaluators.
- Use Tao’s disclaimers as guardrails against hype.
Summary
- Lean-verified proofs are now appearing in AI-assisted math workflows.
- The key enabler is verification-centric pipelines, not one-shot chat answers.
- “Open problem solved” headlines need careful due diligence.
Recommended Hashtags
#ai #lean4 #formalverification #llm #math #agents #gpt52 #alphEvolve #theoremproving #reliability
References
| |
[5]: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ " AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms - Google DeepMind
"