Explainable Verbal Reasoner Plus (EVR+): A Natural Language Reasoning Framework that Supports Diverse Compositional Reasoning

Languages models have been successfully applied to a variety of reasoning tasks in NLP, yet the language models still suffer from compositional generalization. In this paper we present Explainable Verbal Reasoner Plus (EVR+), a reasoning framework that enhances language models' compositional reasoning ability by (1) allowing the model to explicitly generate and execute symbolic operators, and (2) allowing the model to decompose a complex task into several simpler ones in a flexible manner. Compared with its predecessor Explainable Verbal Reasoner (EVR) and other previous approaches adopting similar ideas, our framework supports more diverse types of reasoning such as nested loops and different types of recursion. To evaluate our reasoning framework, we build a synthetic dataset with five tasks that require compositional reasoning. Results show that our reasoning framework can enhance the language model's compositional generalization performance on the five tasks, using a fine-tuned language model. We also discussed the possibility and the challenges to combine our reasoning framework with a few-shot prompted language model.

[1]  Seyed Mehran Kazemi,et al.  LAMBADA: Backward Chaining for Automated Reasoning in Natural Language , 2022, arXiv.org.

[2]  Matt Gardner,et al.  Successive Prompting for Decomposing Complex Questions , 2022, EMNLP.

[3]  Ashish Sabharwal,et al.  Decomposed Prompting: A Modular Approach for Solving Complex Tasks , 2022, ICLR.

[4]  Yu Cao,et al.  Interpretable Proof Generation via Iterative Backward Reasoning , 2022, NAACL.

[5]  I. Higgins,et al.  Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning , 2022, ICLR.

[6]  Andrew O. Arnold,et al.  Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner , 2022, NAACL-HLT.

[7]  Haoming Jiang,et al.  SeqZero: Few-shot Compositional Semantic Parsing with Sequential Prompts and Zero-shot Models , 2022, NAACL-HLT.

[8]  Hongming Zhang,et al.  METGEN: A Module-Based Entailment Tree Generation Framework for Answer Explanation , 2022, NAACL-HLT.

[9]  D. Schuurmans,et al.  Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.

[10]  Xiang Ren,et al.  FaiRR: Faithful and Robust Deductive Reasoning over Natural Language , 2022, ACL.

[11]  Huan Sun,et al.  Iteratively Prompt Pre-trained Language Models for Chain of Thought , 2022, EMNLP.

[12]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[13]  Swarat Chaudhuri,et al.  Natural Language Deduction through Search over Statement Compositions , 2022, EMNLP.

[14]  Ashish Sabharwal,et al.  Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability , 2021, AAAI.

[15]  Ronen Tamari,et al.  Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking , 2021, STARSEM.

[16]  Keith B. Hall,et al.  Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models , 2021, FINDINGS.

[17]  J. Ainslie,et al.  Making Transformers Solve Compositional Tasks , 2021, ACL.

[18]  Weizhu Chen,et al.  What Makes Good In-Context Examples for GPT-3? , 2021, DEELIO.

[19]  Mohammad Bavarian,et al.  Training Verifiers to Solve Math Word Problems , 2021, ArXiv.

[20]  Mihai Surdeanu,et al.  Explainable Multi-hop Verbal Reasoning Through Internal Monologue , 2021, NAACL.

[21]  Oyvind Tafjord,et al.  Explaining Answers with Entailment Trees , 2021, EMNLP.

[22]  Peter Clark,et al.  ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language , 2020, FINDINGS.

[23]  Daniel Khashabi,et al.  Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models , 2021, NAACL.

[24]  Mihai Surdeanu,et al.  Do Transformers Dream of Inference, or Can Pretrained Generative Models Learn Implicit Inferential Rules? , 2020, INSIGHTS.

[25]  Marc van Zee,et al.  Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures , 2020, ArXiv.

[26]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[27]  Hannaneh Hajishirzi,et al.  UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.

[28]  Oyvind Tafjord,et al.  Transformers as Soft Reasoners over Language , 2020, IJCAI.

[29]  Daniel Deutch,et al.  Break It Down: A Question Understanding Benchmark , 2020, TACL.

[30]  Xiao Wang,et al.  Measuring Compositional Generalization: A Comprehensive Method on Realistic Data , 2019, ICLR.

[31]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[32]  Peter Clark,et al.  Everything Happens for a Reason: Discovering the Purpose of Actions in Procedural Text , 2019, EMNLP.

[33]  Joelle Pineau,et al.  CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text , 2019, EMNLP.

[34]  Yan Gao,et al.  Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation , 2019, ACL.

[35]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[36]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[37]  S. S. Ghidary,et al.  Xeggora: Exploiting Immune-to-Evidence Symmetries with Full Aggregation in Statistical Relational Models (Extended Abstract) , 2020, IJCAI.