论文信息 - Learning to Reason in Large Theories without Imitation - 字舞流文

Learning to Reason in Large Theories without Imitation

In this paper, we demonstrate how to do automated theorem proving in the presence of a large knowledge base of potential premises without learning from human proofs. We suggest an exploration mechanism that mixes in additional premises selected by a tf-idf (term frequency-inverse document frequency) based lookup in a deep reinforcement learning scenario. This helps with exploring and learning which premises are relevant for proving a new theorem. Our experiments show that the theorem prover trained with this exploration mechanism outperforms provers that are trained only on human proofs. It approaches the performance of a prover trained by a combination of imitation and reinforcement learning. We perform multiple experiments to understand the importance of the underlying assumptions that make our exploration approach work, thus explaining our design choices.

Sarah M. Loos | Christian Szegedy | Markus N. Rabe | Kshitij Bansal | Christian Szegedy | Kshitij Bansal

[1] R. Petit. A Tutorial Introduction , 1980 .

[2] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .

[3] John Harrison,et al. HOL Light: A Tutorial Introduction , 1996, FMCAD.

[4] Bernhard Schölkopf,et al. A Tutorial Introduction , 2001 .

[5] Georges Gonthier,et al. Formal Proof—The Four- Color Theorem , 2008 .

[6] Tobias Nipkow,et al. The Isabelle Framework , 2008, TPHOLs.

[7] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[8] Jesse Alama,et al. Premise Selection for Mathematics by Corpus Analysis and Kernel Methods , 2011, Journal of Automated Reasoning.

[9] Jeremy Avigad,et al. The Lean Theorem Prover (System Description) , 2015, CADE.

[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11] Cezary Kaliszyk,et al. Learning-assisted theorem proving with millions of lemmas☆ , 2015, J. Symb. Comput..

[12] Daniel Whalen,et al. Holophrasm: a neural Automated Theorem Prover for higher-order logic , 2016, ArXiv.

[13] Cezary Kaliszyk,et al. A Learning-Based Fact Selector for Isabelle/HOL , 2016, Journal of Automated Reasoning.

[14] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[15] Josef Urban,et al. DeepMath - Deep Sequence Models for Premise Selection , 2016, NIPS.

[16] Jian Wang,et al. Premise Selection for Theorem Proving by Deep Graph Embedding , 2017, NIPS.

[17] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[18] Thibault Gauthier,et al. TacticToe: Learning to Reason with HOL4 Tactics , 2017, LPAR.

[19] Tobias Nipkow,et al. A FORMAL PROOF OF THE KEPLER CONJECTURE , 2015, Forum of Mathematics, Pi.

[20] Cezary Kaliszyk,et al. Reinforcement Learning of Theorem Proving , 2018, NeurIPS.

[21] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[22] Dawn Xiaodong Song,et al. GamePad: A Learning Environment for Theorem Proving , 2018, ICLR.

[23] Sarah M. Loos,et al. HOList: An Environment for Machine Learning of Higher-Order Theorem Proving (extended version) , 2019, ArXiv.

[24] Jia Deng,et al. Learning to Prove Theorems via Interacting with Proof Assistants , 2019, ICML.

[25] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[26] Henryk Michalewski,et al. Curriculum Learning and Theorem Proving , 2019 .

[27] Sorin Lerner,et al. Generating correctness proofs with neural networks , 2019, MAPL@PLDI.

[28] Sarah M. Loos,et al. Graph Representations for Higher-Order Logic and Theorem Proving , 2019, AAAI.

[29] Sarah M. Loos,et al. Mathematical Reasoning in Latent Space , 2019, ICLR.