Generating correctness proofs with neural networks

Foundational verification allows programmers to build software which has been empirically shown to have high levels of assurance in a variety of important domains. However, the cost of producing foundationally verified software remains prohibitively high for most projects, as it requires significant manual effort by highly trained experts. In this paper we present Proverbot9001, a proof search system using machine learning techniques to produce proofs of software correctness in interactive theorem provers. We demonstrate Proverbot9001 on the proof obligations from a large practical proof project, the CompCert verified C compiler, and show that it can effectively automate what were previously manual proofs, automatically producing proofs for 28% of theorem statements in our test dataset, when combined with solver-based tooling. Without any additional solvers, we exhibit a proof completion rate that is a 4X improvement over prior state-of-the-art machine learning models for generating proofs in Coq.

[1]  Lawrence C. Paulson,et al.  Natural Deduction as Higher-Order Resolution , 1986, J. Log. Program..

[2]  Dawn Xiaodong Song,et al.  GamePad: A Learning Environment for Theorem Proving , 2018, ICLR.

[3]  Sarah M. Loos,et al.  HOList: An Environment for Machine Learning of Higher-Order Theorem Proving (extended version) , 2019, ArXiv.

[4]  Hugo Herbelin,et al.  The Coq proof assistant : reference manual, version 6.1 , 1997 .

[5]  Cezary Kaliszyk,et al.  Deep Network Guided Proof Search , 2017, LPAR.

[6]  Xavier Leroy,et al.  CompCert: Practical Experience on Integrating and Qualifying a Formally Verified Optimizing Compiler , 2018 .

[7]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[8]  Jia Deng,et al.  Learning to Prove Theorems via Interacting with Proof Assistants , 2019, ICML.

[9]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[10]  Adam Chlipala,et al.  Certified Programming with Dependent Types - A Pragmatic Introduction to the Coq Proof Assistant , 2013 .

[11]  Ekaterina Komendantskaya,et al.  Neural Networks for Proof-Pattern Recognition , 2012, ICANN.

[12]  David Walker,et al.  Example-directed synthesis: a type-theoretic interpretation , 2016, POPL.

[13]  Xi Wang,et al.  Verdi: a framework for implementing and formally verifying distributed systems , 2015, PLDI.

[14]  Martin T. Vechev,et al.  PHOG: Probabilistic Model for Code , 2016, ICML.

[15]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[16]  Peter-Michael Osera,et al.  Type-and-example-directed program synthesis , 2015, PLDI.

[17]  Andrei Voronkov,et al.  First-Order Theorem Proving and Vampire , 2013, CAV.

[18]  Michael Norrish,et al.  seL4: formal verification of an OS kernel , 2009, SOSP '09.

[19]  J. Gregory Morrisett,et al.  Toward a verified relational database management system , 2010, POPL '10.

[20]  Akifumi Imanishi,et al.  Towards Proof Synthesis Guided by Neural Machine Translation for Intuitionistic Propositional Logic , 2017, ArXiv.

[21]  Tao Wang,et al.  TBCNN: A Tree-Based Convolutional Neural Network for Programming Language Processing , 2014, ArXiv.

[22]  Jónathan Heras,et al.  ACL2(ml): Machine-Learning for ACL2 , 2014, ACL2.

[23]  Cezary Kaliszyk,et al.  HolStep: A Machine Learning Dataset for Higher-order Logic Theorem Proving , 2017, ICLR.

[24]  Thibault Gauthier,et al.  TacticToe: Learning to Reason with HOL4 Tactics , 2017, LPAR.

[25]  Thibault Gauthier,et al.  Learning to Reason with HOL4 tactics , 2017, ICLP 2017.

[26]  Adam Chlipala,et al.  Verifying a high-performance crash-safe file system using a tree specification , 2017, SOSP.

[27]  Dan Roth,et al.  Learning invariants using decision trees and implication counterexamples , 2016, POPL.

[28]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.

[29]  Anna Philippou,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 2018, Lecture Notes in Computer Science.

[30]  Sumit Gulwani,et al.  Dimensions in program synthesis , 2010, Formal Methods in Computer Aided Design.

[31]  Gudmund Grov,et al.  Machine Learning in Proof General: Interfacing Interfaces , 2012, UITP.

[32]  Cezary Kaliszyk,et al.  Hammer for Coq: Automation for Dependent Type Theory , 2018, Journal of Automated Reasoning.

[33]  ZdancewicSteve,et al.  Type-and-example-directed program synthesis , 2015 .

[34]  Ole Tange,et al.  GNU Parallel: The Command-Line Power Tool , 2011, login Usenix Mag..

[35]  Andrew W. Appel,et al.  Verification of a Cryptographic Primitive: SHA-256 , 2015, TOPL.

[36]  Stephan Schulz,et al.  System Description: E 1.8 , 2013, LPAR.

[37]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[38]  Mislav Balunovic,et al.  Learning to Solve SMT Formulas , 2018, NeurIPS.

[39]  Truyen Tran,et al.  A deep language model for software code , 2016, FSE 2016.

[40]  Fan Long,et al.  Automatic inference of code transforms for patch generation , 2017, ESEC/SIGSOFT FSE.

[41]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[42]  Josef Urban,et al.  DeepMath - Deep Sequence Models for Premise Selection , 2016, NIPS.

[43]  Ruzica Piskac,et al.  Complete completion using types and weights , 2013, PLDI.

[44]  Sarah M. Loos,et al.  Graph Representations for Higher-Order Logic and Theorem Proving , 2019, AAAI.