LAMBADA: Backward Chaining for Automated Reasoning in Natural Language

Remarkable progress has been made on automated reasoning with natural text, by using Large Language Models (LLMs) and methods such as Chain-of-Thought prompting and Selection-Inference. These techniques search for proofs in the forward direction from axioms to the conclusion, which suffers from a combinatorial explosion of the search space, and thus high failure rates for problems requiring longer chains of reasoning. The classical automated reasoning literature has shown that reasoning in the backward direction (i.e. from intended conclusion to supporting axioms) is significantly more efficient at proof-finding. Importing this intuition into the LM setting, we develop a Backward Chaining algorithm, called LAMBADA, that decomposes reasoning into four sub-modules, that are simply implemented by few-shot prompted LLM inference. We show that LAMBADA achieves sizable accuracy boosts over state-of-the-art forward reasoning methods on two challenging logical reasoning datasets, particularly when deep and accurate proof chains are required.

[1]  J. Schmidhuber,et al.  Large Language Model Programs , 2023, ArXiv.

[2]  Eric P. Xing,et al.  Improved Logical Reasoning of Language Models via Differentiable Symbolic Programming , 2023, ACL.

[3]  L. Lamb,et al.  Neurosymbolic AI: the 3rd wave , 2023, Artificial Intelligence Review.

[4]  Deepak Ramachandran,et al.  Understanding Finetuning for Factual Knowledge Extraction from Language Models , 2023, ArXiv.

[5]  K. Chang,et al.  Towards Reasoning in Large Language Models: A Survey , 2022, ACL.

[6]  Matt Gardner,et al.  Successive Prompting for Decomposing Complex Questions , 2022, EMNLP.

[7]  H. Larochelle,et al.  Teaching Algorithmic Reasoning via In-context Learning , 2022, ArXiv.

[8]  I. Pratt-Hartmann,et al.  Can Transformers Reason in Fragments of Natural Language? , 2022, EMNLP.

[9]  Swarat Chaudhuri,et al.  Natural Language Deduction with Incomplete Information , 2022, EMNLP.

[10]  A. Krishnamurthy,et al.  Transformers Learn Shortcuts to Automata , 2022, ICLR.

[11]  Quoc V. Le,et al.  Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them , 2022, ACL.

[12]  Ashish Sabharwal,et al.  Decomposed Prompting: A Modular Approach for Solving Complex Tasks , 2022, ICLR.

[13]  He He,et al.  Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought , 2022, ICLR.

[14]  Shafiq R. Joty,et al.  FOLIO: Natural Language Reasoning with First-Order Logic , 2022, ArXiv.

[15]  M. Shanahan,et al.  Faithful Reasoning Using Large Language Models , 2022, ArXiv.

[16]  Yuhuai Wu,et al.  Exploring Length Generalization in Large Language Models , 2022, NeurIPS.

[17]  S. Sreedharan,et al.  Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change) , 2022, ArXiv.

[18]  Guy Van den Broeck,et al.  On the Paradox of Learning to Reason from Data , 2022, IJCAI.

[19]  D. Schuurmans,et al.  Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , 2022, ICLR.

[20]  I. Higgins,et al.  Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning , 2022, ICLR.

[21]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[22]  Noah D. Goodman,et al.  STaR: Bootstrapping Reasoning With Reasoning , 2022, NeurIPS.

[23]  Huan Sun,et al.  Iteratively Prompt Pre-trained Language Models for Chain of Thought , 2022, EMNLP.

[24]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[25]  Po-Sen Huang,et al.  Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.

[26]  David Bieber,et al.  Show Your Work: Scratchpads for Intermediate Computation with Language Models , 2021, ArXiv.

[27]  Mohammad Bavarian,et al.  Training Verifiers to Solve Math Word Problems , 2021, ArXiv.

[28]  Preslav Nakov,et al.  RuleBERT: Teaching Soft Rules to Pre-Trained Language Models , 2021, EMNLP.

[29]  Lifeng Shang,et al.  Generate & Rank: A Multi-task Framework for Math Word Problems , 2021, EMNLP.

[30]  Oyvind Tafjord,et al.  Explaining Answers with Entailment Trees , 2021, EMNLP.

[31]  Dawn Song,et al.  Measuring Mathematical Problem Solving With the MATH Dataset , 2021, NeurIPS Datasets and Benchmarks.

[32]  Peter Clark,et al.  ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language , 2020, FINDINGS.

[33]  Harsh Jhamtani,et al.  Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering , 2020, EMNLP.

[34]  Siva Reddy,et al.  Measuring Systematic Generalization in Neural Proof Generation with Transformers , 2020, NeurIPS.

[35]  Gregor Betz,et al.  Critical Thinking for Language Models , 2020, IWCS.

[36]  Benno Krojer,et al.  Are Pretrained Language Models Symbolic Reasoners over Knowledge? , 2020, CONLL.

[37]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[38]  Oyvind Tafjord,et al.  Transformers as Soft Reasoners over Language , 2020, IJCAI.

[39]  Gary Marcus,et al.  The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence , 2020, ArXiv.

[40]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[41]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[42]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[43]  Carl Hewitt,et al.  PLANNER: A Language for Proving Theorems in Robots , 1969, IJCAI.

[44]  John McCarthy,et al.  Programs with common sense , 1960 .

[45]  Noah D. Goodman,et al.  STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning , 2022 .

[46]  Alan K. Mackworth,et al.  Artificial Intelligence - Foundations of Computational Agents , 2010 .