Reinforcement Learning of Theorem Proving

We introduce a theorem proving algorithm that uses practically no domain heuristics for guiding its connection-style proof search. Instead, it runs many Monte-Carlo simulations guided by reinforcement learning from previous proof attempts. We produce several versions of the prover, parameterized by different learning and guiding algorithms. The strongest version of the system is trained on a large corpus of mathematical problems and evaluated on previously unseen problems. The trained system solves within the same number of inferences over 40% more problems than a baseline prover, which is an unusually high improvement in this hard AI domain. To our knowledge this is the first time reinforcement learning has been convincingly applied to solving general mathematical problems on a large scale.

[1]  Cezary Kaliszyk,et al.  Hammering towards QED , 2016, J. Formaliz. Reason..

[2]  Thibault Gauthier,et al.  Premise Selection and External Provers for HOL4 , 2015, CPP.

[3]  Josef Urban,et al.  MaLeCoP Machine Learning Connection Prover , 2011, TABLEAUX.

[4]  Cezary Kaliszyk,et al.  Certified Connection Tableaux Proofs for HOL Light and TPTP , 2014, CPP.

[5]  Wolfgang Bibel,et al.  leanCoP: lean connection-based theorem proving , 2003, J. Symb. Comput..

[6]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[7]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[8]  Adam Naumowicz,et al.  Mizar in a Nutshell , 2010, J. Formaliz. Reason..

[9]  Josef Urban,et al.  MaLARea SG1- Machine Learner for Automated Reasoning with Semantic Guidance , 2008, IJCAR.

[10]  Josef Urban,et al.  BliStr: The Blind Strategymaker , 2013, GCAI.

[11]  Andrei Voronkov,et al.  First-Order Theorem Proving and Vampire , 2013, CAV.

[12]  Jesse Alama,et al.  Premise Selection for Mathematics by Corpus Analysis and Kernel Methods , 2011, Journal of Automated Reasoning.

[13]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14]  Christoph Goller,et al.  Controlled integration of the cut rule into connection tableau calculi , 2004, Journal of Automated Reasoning.

[15]  Josef Urban,et al.  DeepMath - Deep Sequence Models for Premise Selection , 2016, NIPS.

[16]  Reiner Hähnle,et al.  Tableaux and Related Methods , 2001, Handbook of Automated Reasoning.

[17]  Rajeev Raman,et al.  SEPIA: Search for Proofs Using Inferred Automata , 2015, CADE.

[18]  Cezary Kaliszyk,et al.  Deep Network Guided Proof Search , 2017, LPAR.

[19]  J. A. Robinson,et al.  A Machine-Oriented Logic Based on the Resolution Principle , 1965, JACM.

[20]  Cezary Kaliszyk,et al.  Machine Learner for Automated Reasoning 0.4 and 0.5 , 2014, PAAR@IJCAR.

[21]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Cezary Kaliszyk,et al.  Stronger Automation for Flyspeck by Feature Weighting and Strategy Evolution , 2013, PxTP@CADE.

[24]  Jens Otten Restricting backtracking in connection calculi , 2010, AI Commun..

[25]  Thibault Gauthier,et al.  TacticToe: Learning to Reason with HOL4 Tactics , 2017, LPAR.

[26]  Cezary Kaliszyk,et al.  FEMaLeCoP: Fairly Efficient Machine Learning Connection Prover , 2015, LPAR.

[27]  Josef Urban,et al.  Hierarchical invention of theorem proving strategies , 2018, AI Commun..

[28]  Cezary Kaliszyk,et al.  Efficient Semantic Features for Automated Reasoning over Large Theories , 2015, IJCAI.

[29]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30]  Lawrence C. Paulson,et al.  Translating Higher-Order Clauses to First-Order Clauses , 2007, Journal of Automated Reasoning.

[31]  Josef Urban,et al.  Overview and Evaluation of Premise Selection Techniques for Large Theory Mathematics , 2012, IJCAR.

[32]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[33]  Cesare Tinelli,et al.  Satisfiability Modulo Theories , 2021, Handbook of Satisfiability.

[34]  Harald Ganzinger,et al.  Rewrite-Based Equational Theorem Proving with Selection and Simplification , 1994, J. Log. Comput..

[35]  Cezary Kaliszyk,et al.  A Learning-Based Fact Selector for Isabelle/HOL , 2016, Journal of Automated Reasoning.

[36]  Cezary Kaliszyk,et al.  Monte Carlo Tableau Proof Search , 2017, CADE.

[37]  J. A. Robinson,et al.  Handbook of Automated Reasoning (in 2 volumes) , 2001 .

[38]  Josef Urban,et al.  ATPboost: Learning Premise Selection in Binary Setting with ATP Feedback , 2018, IJCAR.

[39]  William McCune Otter 2.0 , 1990, CADE.

[40]  Cezary Kaliszyk,et al.  MizAR 40 for Mizar 40 , 2013, Journal of Automated Reasoning.

[41]  Stephan Schulz,et al.  Breeding Theorem Proving Heuristics with Genetic Algorithms , 2015, GCAI.

[42]  Daniel Whalen,et al.  Holophrasm: a neural Automated Theorem Prover for higher-order logic , 2016, ArXiv.

[43]  Josef Urban,et al.  BliStrTune: hierarchical invention of theorem proving strategies , 2017, CPP.

[44]  Josef Urban,et al.  MPTP 0.2: Design, Implementation, and Initial Experiments , 2006, Journal of Automated Reasoning.

[45]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[46]  David Barber,et al.  Thinking Fast and Slow with Deep Learning and Tree Search , 2017, NIPS.

[47]  Josef Urban,et al.  MaLeS: A Framework for Automatic Tuning of Automated Theorem Provers , 2013, Journal of Automated Reasoning.

[48]  Josef Urban,et al.  ENIGMA: Efficient Learning-Based Inference Guiding Machine , 2017, CICM.

[49]  Cezary Kaliszyk,et al.  Learning-Assisted Automated Reasoning with Flyspeck , 2012, Journal of Automated Reasoning.

[50]  Stephan Schulz,et al.  System Description: E 1.8 , 2013, LPAR.