HolStep: A Machine Learning Dataset for Higher-order Logic Theorem Proving

Large computer-understandable proofs consist of millions of intermediate logical steps. The vast majority of such steps originate from manually selected and manually guided heuristics applied to intermediate goals. So far, machine learning has generally not been used to filter or generate these steps. In this paper, we introduce a new dataset based on Higher-Order Logic (HOL) proofs, for the purpose of developing new machine learning-based theorem-proving strategies. We make this dataset publicly available under the BSD license. We propose various machine learning tasks that can be performed on this dataset, and discuss their significance for theorem proving. We also benchmark a set of simple baseline machine learning models suited for the tasks (including logistic regression, convolutional neural networks and recurrent neural networks). The results of our baseline models show the promise of applying machine learning to HOL theorem proving.

[1]  Alonzo Church,et al.  A formulation of the simple theory of types , 1940, Journal of Symbolic Logic.

[2]  R. Hindley The Principal Type-Scheme of an Object in Combinatory Logic , 1969 .

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Tobias Nipkow,et al.  Term rewriting and all that , 1998 .

[5]  Lawrence C. Paulson,et al.  A Generic Tableau Prover and its Integration with Isabelle , 1999, J. Univers. Comput. Sci..

[6]  J. Hurd First-Order Proof Tactics in Higher-Order Logic Theorem Provers In Proc , 2003 .

[7]  Hazel Duncan,et al.  The use of data-mining for the automatic formation of tactics , 2004 .

[8]  Steven Obua,et al.  Importing HOL into Isabelle/HOL , 2006, IJCAR.

[9]  P. Mahadevan,et al.  An overview , 2007, Journal of Biosciences.

[10]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  John Harrison,et al.  HOL Light: An Overview , 2009, TPHOLs.

[13]  Adam Naumowicz,et al.  Mizar in a Nutshell , 2010, J. Formaliz. Reason..

[14]  Tobias Nipkow,et al.  A Revision of the Proof of the Kepler Conjecture , 2009, Discret. Comput. Geom..

[15]  Josef Urban,et al.  MaLeCoP Machine Learning Connection Prover , 2011, TABLEAUX.

[16]  Andrei Voronkov,et al.  Sine Qua Non for Large Theory Reasoning , 2011, CADE.

[17]  Joe Hurd,et al.  The OpenTheory Standard Theory Library , 2011, NASA Formal Methods.

[18]  John Harrison,et al.  The HOL Light Theory of Euclidean Space , 2012, Journal of Automated Reasoning.

[19]  Stephan Schulz,et al.  System Description: E 1.8 , 2013, LPAR.

[20]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[21]  Jeremy Avigad,et al.  A Machine-Checked Proof of the Odd Order Theorem , 2013, ITP.

[22]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[23]  Andrei Voronkov,et al.  First-Order Theorem Proving and Vampire , 2013, CAV.

[24]  Cezary Kaliszyk,et al.  Scalable LCF-Style Proof Translation , 2013, ITP.

[25]  Jesse Alama,et al.  Premise Selection for Mathematics by Corpus Analysis and Kernel Methods , 2011, Journal of Automated Reasoning.

[26]  Cezary Kaliszyk,et al.  Learning-Assisted Automated Reasoning with Flyspeck , 2012, Journal of Automated Reasoning.

[27]  Lawrence C. Paulson,et al.  Machine Learning for First-Order Theorem Proving , 2014, Journal of Automated Reasoning.

[28]  Gernot Heiser,et al.  Comprehensive formal verification of an OS microkernel , 2014, TOCS.

[29]  Josef Urban,et al.  History of Interactive Theorem Proving , 2014, Computational Logic.

[30]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[31]  Tobias Nipkow,et al.  Mining the Archive of Formal Proofs , 2015, CICM.

[32]  Cezary Kaliszyk,et al.  FEMaLeCoP: Fairly Efficient Machine Learning Connection Prover , 2015, LPAR.

[33]  Josef Urban,et al.  MaLeS: A Framework for Automatic Tuning of Automated Theorem Provers , 2013, Journal of Automated Reasoning.

[34]  Cezary Kaliszyk,et al.  Learning-assisted theorem proving with millions of lemmas☆ , 2015, J. Symb. Comput..

[35]  Cezary Kaliszyk,et al.  Learning to Parse on Aligned Corpora (Rough Diamond) , 2015, ITP.

[36]  Cezary Kaliszyk,et al.  Hammering towards QED , 2016, J. Formaliz. Reason..

[37]  Geoff Sutcliffe The CADE ATP System Competition - CASC , 2016, AI Mag..

[38]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[39]  Cezary Kaliszyk,et al.  What's in a Theorem Name? , 2016, ITP.

[40]  Chad E. Brown,et al.  Internal Guidance for Satallax , 2016, IJCAR.

[41]  Adam Chlipala,et al.  Using Crash Hoare logic for certifying the FSCQ file system , 2015, USENIX Annual Technical Conference.

[42]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[43]  Josef Urban,et al.  DeepMath - Deep Sequence Models for Premise Selection , 2016, NIPS.

[44]  Tobias Nipkow,et al.  A FORMAL PROOF OF THE KEPLER CONJECTURE , 2015, Forum of Mathematics, Pi.

[45]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .