Learning to Reason in Large Theories without Imitation

In this paper, we demonstrate how to do automated theorem proving in the presence of a large knowledge base of potential premises without learning from human proofs. We suggest an exploration mechanism that mixes in additional premises selected by a tf-idf (term frequency-inverse document frequency) based lookup in a deep reinforcement learning scenario. This helps with exploring and learning which premises are relevant for proving a new theorem. Our experiments show that the theorem prover trained with this exploration mechanism outperforms provers that are trained only on human proofs. It approaches the performance of a prover trained by a combination of imitation and reinforcement learning. We perform multiple experiments to understand the importance of the underlying assumptions that make our exploration approach work, thus explaining our design choices.

[1]  R. Petit A Tutorial Introduction , 1980 .

[2]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[3]  John Harrison,et al.  HOL Light: A Tutorial Introduction , 1996, FMCAD.

[4]  Bernhard Schölkopf,et al.  A Tutorial Introduction , 2001 .

[5]  Georges Gonthier,et al.  Formal Proof—The Four- Color Theorem , 2008 .

[6]  Tobias Nipkow,et al.  The Isabelle Framework , 2008, TPHOLs.

[7]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[8]  Jesse Alama,et al.  Premise Selection for Mathematics by Corpus Analysis and Kernel Methods , 2011, Journal of Automated Reasoning.

[9]  Jeremy Avigad,et al.  The Lean Theorem Prover (System Description) , 2015, CADE.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Cezary Kaliszyk,et al.  Learning-assisted theorem proving with millions of lemmas☆ , 2015, J. Symb. Comput..

[12]  Daniel Whalen,et al.  Holophrasm: a neural Automated Theorem Prover for higher-order logic , 2016, ArXiv.

[13]  Cezary Kaliszyk,et al.  A Learning-Based Fact Selector for Isabelle/HOL , 2016, Journal of Automated Reasoning.

[14]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[15]  Josef Urban,et al.  DeepMath - Deep Sequence Models for Premise Selection , 2016, NIPS.

[16]  Jian Wang,et al.  Premise Selection for Theorem Proving by Deep Graph Embedding , 2017, NIPS.

[17]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[18]  Thibault Gauthier,et al.  TacticToe: Learning to Reason with HOL4 Tactics , 2017, LPAR.

[19]  Tobias Nipkow,et al.  A FORMAL PROOF OF THE KEPLER CONJECTURE , 2015, Forum of Mathematics, Pi.

[20]  Cezary Kaliszyk,et al.  Reinforcement Learning of Theorem Proving , 2018, NeurIPS.

[21]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[22]  Dawn Xiaodong Song,et al.  GamePad: A Learning Environment for Theorem Proving , 2018, ICLR.

[23]  Sarah M. Loos,et al.  HOList: An Environment for Machine Learning of Higher-Order Theorem Proving (extended version) , 2019, ArXiv.

[24]  Jia Deng,et al.  Learning to Prove Theorems via Interacting with Proof Assistants , 2019, ICML.

[25]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[26]  Henryk Michalewski,et al.  Curriculum Learning and Theorem Proving , 2019 .

[27]  Sorin Lerner,et al.  Generating correctness proofs with neural networks , 2019, MAPL@PLDI.

[28]  Sarah M. Loos,et al.  Graph Representations for Higher-Order Logic and Theorem Proving , 2019, AAAI.

[29]  Sarah M. Loos,et al.  Mathematical Reasoning in Latent Space , 2019, ICLR.