Modelling High-Level Mathematical Reasoning in Mechanised Declarative Proofs

Mathematical proofs can be mechanised using proof assistants to eliminate gaps and errors. However, mechanisation still requires intensive labour. To promote automation, it is essential to capture high-level human mathematical reasoning, which we address as the problem of generating suitable propositions. We build a non-synthetic dataset from the largest repository of mechanised proofs and propose a task on causal reasoning, where a model is required to fill in a missing intermediate proposition given a causal context. Our experiments (using various neural sequence-to-sequence models) reveal that while the task is challenging, neural models can indeed capture non-trivial mathematical reasoning. We further propose a hierarchical transformer model that outperforms the transformer baseline.

[1]  SÉBASTIEN GOUËZEL,et al.  SUBADDITIVE COCYCLES AND HOROFUNCTIONS , 2019, Proceedings of the International Congress of Mathematicians (ICM 2018).

[2]  Cezary Kaliszyk,et al.  Hammering towards QED , 2016, J. Formaliz. Reason..

[3]  Andrei Voronkov,et al.  The design and implementation of VAMPIRE , 2002, AI Commun..

[4]  Craig E. Larson,et al.  A Survey of Research in Automated Mathematical Conjecture-Making , 2001, Graphs and Discovery.

[5]  Lawrence Charles Paulson,et al.  Isabelle: A Generic Theorem Prover , 1994 .

[6]  Sorin Lerner,et al.  Generating correctness proofs with neural networks , 2019, MAPL@PLDI.

[7]  Tobias Nipkow,et al.  Priority Search Trees , 2019, Arch. Formal Proofs.

[8]  Cezary Kaliszyk,et al.  Deep Network Guided Proof Search , 2017, LPAR.

[9]  Freek Wiedijk,et al.  Mizar Light for HOL Light , 2001, TPHOLs.

[10]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[11]  Lawrence C. Paulson,et al.  Lightweight relevance filtering for machine-generated resolution problems , 2009, J. Appl. Log..

[12]  Piotr Rudnicki,et al.  An Overview of the MIZAR Project , 1992 .

[13]  Guillaume Lample,et al.  Deep Learning for Symbolic Mathematics , 2019, ICLR.

[14]  Thibault Gauthier,et al.  Initial Experiments with Statistical Conjecturing over Large Formal Corpora , 2016, FM4M/MathUI/ThEdu/DP/WIP@CIKM.

[15]  Sarah M. Loos,et al.  Graph Representations for Higher-Order Logic and Theorem Proving , 2019, AAAI.

[16]  Emin Karayel,et al.  Strong eventual consistency of the collaborative editing framework WOOT , 2022, Arch. Formal Proofs.

[17]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[18]  Thibault Gauthier,et al.  TacticToe: Learning to Reason with HOL4 Tactics , 2017, LPAR.

[19]  Qingxiang Wang,et al.  Exploration of neural machine translation in autoformalization of mathematics in Mizar , 2019, CPP.

[20]  Alan Bundy,et al.  Conjecture Synthesis for Inductive Theories , 2011, Journal of Automated Reasoning.

[21]  Alastair R. Beresford,et al.  Verifying strong eventual consistency in distributed systems , 2017, Proc. ACM Program. Lang..

[22]  Pushmeet Kohli,et al.  Analysing Mathematical Reasoning Abilities of Neural Models , 2019, ICLR.

[23]  Michael J. C. Gordon,et al.  Edinburgh LCF: A mechanised logic of computation , 1979 .

[24]  Dawn Xiaodong Song,et al.  GamePad: A Learning Environment for Theorem Proving , 2018, ICLR.

[25]  Jesse Alama,et al.  Premise Selection for Mathematics by Corpus Analysis and Kernel Methods , 2011, Journal of Automated Reasoning.

[26]  Jia Deng,et al.  Learning to Prove Theorems via Interacting with Proof Assistants , 2019, ICML.

[27]  Ilya Sergey,et al.  Mechanising blockchain consensus , 2018, CPP.

[28]  Cezary Kaliszyk,et al.  HolStep: A Machine Learning Dataset for Higher-order Logic Theorem Proving , 2017, ICLR.

[29]  Jia Deng,et al.  Learning to Prove Theorems by Learning to Generate Theorems , 2020, NeurIPS.

[30]  Andrei Voronkov,et al.  Sine Qua Non for Large Theory Reasoning , 2011, CADE.

[31]  Markus Wenzel,et al.  Isabelle, Isar - a versatile environment for human readable formal proof documents , 2002 .

[32]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.

[33]  Stefan Berghofer,et al.  The Isabelle/Isar Implementation , 2016 .

[34]  Josef Urban,et al.  DeepMath - Deep Sequence Models for Premise Selection , 2016, NIPS.

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[37]  Assia Mahboubi,et al.  An introduction to small scale reflection in Coq , 2010, J. Formaliz. Reason..

[38]  Grigory Fedyukovich,et al.  Lemma Synthesis for Automating Induction over Algebraic Data Types , 2019, CP.

[39]  Tobias Nipkow,et al.  A FORMAL PROOF OF THE KEPLER CONJECTURE , 2015, Forum of Mathematics, Pi.

[40]  Michael Norrish,et al.  seL4: formal verification of an OS kernel , 2009, SOSP '09.

[41]  Cezary Kaliszyk,et al.  MaSh: Machine Learning for Sledgehammer , 2013, ITP.

[42]  Jeremy Avigad,et al.  A Machine-Checked Proof of the Odd Order Theorem , 2013, ITP.

[43]  Dennis Clark,et al.  The Prime Number Theorem , 2002 .

[44]  Sarah M. Loos,et al.  HOList: An Environment for Machine Learning of Higher Order Logic Theorem Proving , 2019, ICML.

[45]  Cezary Kaliszyk,et al.  Hammer for Coq: Automation for Dependent Type Theory , 2018, Journal of Automated Reasoning.

[46]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[47]  G. Bezhanishvili GÖDEL’S INCOMPLETENESS THEOREMS , 2019, Infinity and the Mind.

[48]  Lawrence C. Paulson,et al.  Extending Sledgehammer with SMT Solvers , 2011, Journal of Automated Reasoning.

[49]  Yves Bertot,et al.  Interactive Theorem Proving and Program Development: Coq'Art The Calculus of Inductive Constructions , 2010 .