LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks. Here, we replace architecture engineering by encoding inductive bias in the form of datasets. Inspired by Peirce’s view that deduction, induction, and abduction are the primitives of reasoning, we design three synthetic tasks that are intended to require the model to have these three abilities. We specifically design these tasks to be synthetic and devoid of mathematical knowledge to ensure that only the fundamental reasoning biases can be learned from these tasks. This defines a new pre-training methodology called “LIME” (Learning Inductive bias for Mathematical rEasoning). Models trained with LIME significantly outperform vanilla transformers on four very different large mathematical reasoning benchmarks. Unlike dominating the computation cost as traditional pre-training approaches, LIME requires only a small fraction of the computation cost of the typical downstream task. The code for generating LIME tasks is available at https: //github.com/tonywu95/LIME.

[1]  H. Putnam,et al.  Reasoning and the Logic of Things. The Cambridge Conferences Lectures of 1898. , 1993 .

[2]  Hilary Putnam,et al.  Reasoning and the logic of things , 1992 .

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  David A. Forsyth,et al.  Shape, Contour and Grouping in Computer Vision , 1999, Lecture Notes in Computer Science.

[5]  Yoshua Bengio,et al.  Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.

[6]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[7]  Jeremy Avigad,et al.  The Lean Theorem Prover (System Description) , 2015, CADE.

[8]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  A. Pietarinen,et al.  Charles Sanders Peirce: Logic , 2016 .

[12]  Jian Wang,et al.  Premise Selection for Theorem Proving by Deep Graph Embedding , 2017, NIPS.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Thibault Gauthier,et al.  Learning to Prove with Tactics , 2018, ArXiv.

[15]  Richard Evans,et al.  Can Neural Networks Understand Logical Entailment? , 2018, ICLR.

[16]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[17]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[18]  Xiaodong Liu,et al.  Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[19]  Achille Fokoue,et al.  Improving Graph Neural Network Representations of Logical Formulae with Subgraph Pooling , 2019, ArXiv.

[20]  Sarah M. Loos,et al.  Learning to Reason in Large Theories without Imitation , 2019, ArXiv.

[21]  Dawn Xiaodong Song,et al.  GamePad: A Learning Environment for Theorem Proving , 2018, ICLR.

[22]  Nikolaj Bjørner,et al.  Guiding High-Performance SAT Solvers with Unsat-Core Predictions , 2019, SAT.

[23]  Sarah M. Loos,et al.  HOList: An Environment for Machine Learning of Higher Order Logic Theorem Proving , 2019, ICML.

[24]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[25]  Jia Deng,et al.  Learning to Prove Theorems via Interacting with Proof Assistants , 2019, ICML.

[26]  Pushmeet Kohli,et al.  Analysing Mathematical Reasoning Abilities of Neural Models , 2019, ICLR.

[27]  Jianfeng Gao,et al.  Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving , 2019, ArXiv.

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[30]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[31]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[32]  David L. Dill,et al.  Learning a SAT Solver from Single-Bit Supervision , 2018, ICLR.

[33]  Rishabh Singh,et al.  Global Relational Models of Source Code , 2020, ICLR.

[34]  Samuel R. Bowman,et al.  Can neural networks acquire a structural bias from raw linguistic data? , 2020, CogSci.

[35]  Dan Jurafsky,et al.  Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models , 2020, EMNLP.

[36]  Jan Jakubuv,et al.  First Neural Conjecturing Datasets and Experiments , 2020, CICM.

[37]  Ken-ichi Kawarabayashi,et al.  What Can Neural Networks Reason About? , 2019, ICLR.

[38]  Josef Urban,et al.  Guiding Inferences in Connection Tableau by Recurrent Neural Networks , 2020, CICM.

[39]  Floris van Doorn,et al.  A formal proof of the independence of the continuum hypothesis , 2020, CPP.

[40]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[41]  Sarah M. Loos,et al.  Graph Representations for Higher-Order Logic and Theorem Proving , 2019, AAAI.

[42]  Kevin Buzzard,et al.  Formalising perfectoid spaces , 2019, CPP.

[43]  Lei Yu,et al.  Modelling High-Level Mathematical Reasoning in Mechanised Declarative Proofs , 2020, ArXiv.

[44]  Markus N. Rabe,et al.  Transformers Generalize to the Semantics of Logics , 2020 .

[45]  Ilya Sutskever,et al.  Generative Language Modeling for Automated Theorem Proving , 2020, ArXiv.

[46]  Edward A. Lee,et al.  Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning , 2018, ICLR.

[47]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[48]  Jesse Michael Han Enhancing SAT solvers with glue variable predictions , 2020, ArXiv.

[49]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[50]  The mathlib Community The lean mathematical library , 2019, CPP.

[51]  Qingxiang Wang,et al.  Exploration of neural machine translation in autoformalization of mathematics in Mizar , 2019, CPP.

[52]  Thibault Gauthier,et al.  TacticToe: Learning to Prove with Tactics , 2018, Journal of Automated Reasoning.

[53]  Guillaume Lample,et al.  Deep Learning for Symbolic Mathematics , 2019, ICLR.

[54]  Thomas L. Griffiths,et al.  Universal linguistic inductive biases via meta-learning , 2020, CogSci.

[55]  Peter J. Liu,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.

[56]  Jimmy Ba,et al.  INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving , 2020, ICLR.

[57]  Sanjit A. Seshia,et al.  Learning Branching Heuristics for Propositional Model Counting , 2020, ArXiv.

[58]  Jesse Michael Han,et al.  Proof Artifact Co-training for Theorem Proving with Language Models , 2021, ICLR.

[59]  Christian Szegedy,et al.  Mathematical Reasoning via Self-supervised Skip-tree Training , 2020, ICLR.

[60]  Lawrence C. Paulson,et al.  IsarStep: a Benchmark for High-level Mathematical Reasoning , 2021, ICLR.

[61]  Peter Scholze,et al.  Liquid Tensor Experiment , 2021, Exp. Math..

[62]  C. Hughes,et al.  Schemes in Lean , 2021, Exp. Math..