Leonardo de Moura has spent decades building tools that allow mathematicians and computer scientists to verify the correctness of proofs and programs. As the creator of Lean, the open-source theorem prover that has attracted a fervent following among mathematicians and AI researchers alike, de Moura occupies a rare position at the intersection of formal verification and artificial intelligence. In a February 2026 blog post titled “When AI Writes the World’s Software,” he laid out a vision that is both exhilarating and sobering: AI systems will soon write most of the world’s software, but without formal verification, the results could be catastrophic.
De Moura’s argument is not a speculative thought experiment from someone on the periphery. He led the development of Lean at Microsoft Research before moving to the newly formed Lean Focused Research Organization (FRO), and his work has become central to efforts by companies like Google DeepMind and Meta AI to train large language models that can reason mathematically. His blog post represents one of the most technically grounded assessments yet of what happens when AI-generated code becomes the norm rather than the exception.
The Coming Flood of Machine-Written Code
The core premise of de Moura’s argument is straightforward: large language models are already capable of generating functional code at remarkable speed, and their capabilities are improving rapidly. GitHub Copilot, Claude, GPT-4, and their successors can produce code that passes unit tests, handles edge cases, and even mimics the style of experienced developers. But de Moura warns that the speed and volume of AI-generated code will soon outstrip humanity’s ability to review it manually. “We are heading toward a world where most software is written by AI,” he wrote. “The question is not whether this will happen, but whether we will be able to trust the software that AI produces.”
This is not a hypothetical concern. As AI coding assistants become embedded in professional workflows, the ratio of machine-generated to human-written code is shifting dramatically. According to recent reporting, companies across the technology sector are already seeing significant portions of their new code produced or substantially assisted by AI. De Moura’s point is that traditional software testing — unit tests, integration tests, code review — was designed for a world where humans wrote code at human speed. When machines produce code orders of magnitude faster, the testing bottleneck becomes untenable.
Why Testing Alone Won’t Be Enough
De Moura draws a sharp distinction between testing and verification. Testing checks that software behaves correctly for a finite set of inputs. Verification proves that software behaves correctly for all possible inputs, using mathematical logic. The difference is not academic — it is the difference between a bridge that held up during the load tests and a bridge that is mathematically guaranteed not to collapse under any specified load.
In his blog post, de Moura argues that formal verification must become the standard for AI-generated code, particularly in safety-critical domains such as aviation, medical devices, autonomous vehicles, and financial systems. He points to Lean as a tool that can serve this purpose, not just for pure mathematics but for verifying the correctness of software. Lean’s type system and proof language allow developers — or AI systems — to write specifications that describe what a program should do, and then prove that the program meets those specifications. “Formal verification is the only scalable way to ensure correctness when the volume of code exceeds human review capacity,” de Moura wrote.
Lean’s Rise From Mathematical Curiosity to Industrial Tool
Lean began as a project at Microsoft Research, where de Moura started development in 2013. For years, it was primarily used by mathematicians to formalize proofs — translating informal mathematical reasoning into machine-checkable form. The project gained significant attention when mathematician Kevin Buzzard at Imperial College London began advocating for the formalization of undergraduate mathematics in Lean, and when the Lean community successfully formalized a proof of a result by Fields Medalist Peter Scholze in a project known as the Liquid Tensor Experiment.
But the tool’s relevance has expanded well beyond pure mathematics. Lean 4, the latest major version, was designed from the ground up to be both a theorem prover and a programming language. This dual nature makes it uniquely suited for the task de Moura envisions: writing software that comes with built-in mathematical guarantees of correctness. The language’s metaprogramming capabilities allow users to extend Lean itself, creating domain-specific languages and automation tactics that can handle much of the tedious work of proof construction.
AI and Formal Verification: A Symbiotic Relationship
One of the most striking aspects of de Moura’s vision is the feedback loop he describes between AI and formal verification. On one hand, AI systems can generate code and even attempt to generate proofs of correctness. On the other hand, formal verification tools like Lean can check those proofs automatically, providing a reliable signal about whether the AI’s output is actually correct. This creates a training signal that is qualitatively different from the fuzzy, probabilistic feedback that current AI systems receive from unit tests or human ratings.
This idea has attracted serious investment from the AI research community. Google DeepMind’s AlphaProof system, which made headlines in 2024 for solving International Mathematical Olympiad problems, used Lean as its verification backend. Meta AI has also invested in formal verification research, exploring how large language models can be trained to produce Lean proofs. The appeal is clear: if an AI can generate a proof that Lean accepts, the proof is correct — period. There is no ambiguity, no hallucination, no subtle error that slips past a human reviewer. The proof checker is the ultimate arbiter.
The Gap Between Vision and Reality
For all the promise, de Moura is candid about the challenges that remain. Writing formal specifications is hard. Most software engineers have no training in formal methods, and the learning curve for tools like Lean is steep. The mathematical libraries needed to verify real-world software — covering everything from floating-point arithmetic to network protocols to cryptographic primitives — are still incomplete. And while AI systems are getting better at generating proofs, they are far from reliable enough to handle the full complexity of industrial software verification without human guidance.
There is also a cultural barrier. The software industry has, for decades, operated on the principle that “good enough” testing is sufficient. Formal verification has been confined to niches where failure is unacceptable: avionics, nuclear systems, certain cryptographic implementations. Expanding formal verification to mainstream software development would require not just better tools but a fundamental shift in how software engineers are trained and how companies allocate engineering resources.
The Economic Logic of Verified AI-Generated Code
De Moura makes an economic argument as well. As AI-generated code becomes cheaper and more abundant, the marginal cost of producing code drops toward zero. But the cost of bugs — security vulnerabilities, system failures, data corruption — remains high and may increase as software systems grow more complex and more interconnected. In this environment, the value proposition of formal verification changes. When code was expensive to write, spending additional resources on formal proofs seemed like a luxury. When code is nearly free to generate, the dominant cost becomes ensuring that the code is correct. Verification becomes not a luxury but a necessity.
This logic is already playing out in certain sectors. The cryptocurrency and decentralized finance industries, where smart contract bugs can result in the immediate and irreversible loss of hundreds of millions of dollars, have been early adopters of formal verification. Companies like CertiK and Runtime Verification have built businesses around formally verifying blockchain code. De Moura suggests that as AI-generated code proliferates, similar demand for verification will emerge across the broader software industry.
What the Next Five Years Could Look Like
De Moura’s blog post is ultimately a call to action. He argues that the formal verification community needs to prepare for a world in which AI generates vast quantities of code, and that tools like Lean need to become more accessible, more powerful, and more tightly integrated with AI development workflows. He envisions a future in which AI systems generate both code and proofs simultaneously, with formal verification serving as an automated quality gate that catches errors before they reach production.
The stakes are significant. If AI-generated code is deployed at scale without adequate verification, the result could be a proliferation of subtle bugs and security vulnerabilities that overwhelm the capacity of human engineers to find and fix them. If, on the other hand, the formal verification infrastructure is in place, AI-generated code could be more reliable than human-written code — not because AI is infallible, but because every line of output can be checked against a rigorous mathematical specification.
De Moura’s vision places Lean at the center of this future, but the broader point transcends any single tool. The question he poses — how do we trust software that no human wrote or fully reviewed? — is one that the entire technology industry will need to answer in the years ahead. His answer, refined over decades of work in formal methods, is that trust must be grounded in mathematical proof, not human intuition. Whether the industry listens may determine the reliability of the software that increasingly runs the world.
