APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning

Logical reasoning over text is an important ability that requires understanding the semantics of the text and reasoning through them to arrive at correct inferences. Prior works on pretraining language models to improve the logical reasoning ability require complex processing of training data (e.g., aligning symbolic knowledge to text), yielding task-specific data augmentation that is not easy to adapt to any general text corpus. In this work, we propose APOLLO, a simple adaptive pretraining approach to improve the logical reasoning skills of language models. We select a subset of Wikipedia for adaptive pretraining using a set of logical inference keywords as filter words. Further, we propose two self-supervised loss functions for training. First, we modify the masked language modeling loss only to mask specific parts-of-speech words that likely require higher-order reasoning to predict them. Second, we propose a sentence-level classification loss that teaches the model to distinguish between entailment and contradiction types of sentences. The proposed pretraining paradigm is both simple and independent of task formats. We demonstrate the effectiveness of APOLLO by comparing it with prior baselines on two logical reasoning datasets. APOLLO performs comparably on ReClor and outperforms baselines on LogiQA.

[1]  R. Mamidi,et al.  Using Selective Masking as a Bridge between Pre-training and Fine-tuning , 2022, ArXiv.

[2]  Xiang Ren,et al.  RobustLR: A Diagnostic Benchmark for Evaluating Logical Robustness of Deductive Reasoners , 2022, EMNLP.

[3]  Guy Van den Broeck,et al.  On the Paradox of Learning to Reason from Data , 2022, IJCAI.

[4]  Qika Lin,et al.  Logiformer: A Two-Branch Graph Transformer Network for Interpretable Logical Reasoning , 2022, SIGIR.

[5]  Xiang Ren,et al.  FaiRR: Faithful and Robust Deductive Reasoning over Natural Language , 2022, ACL.

[6]  Gong Cheng,et al.  AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension , 2022, ACL.

[7]  Liqiang Nie,et al.  MERIt: Meta-Path Guided Contrastive Learning for Logical Reasoning , 2022, FINDINGS.

[8]  Weizhu Chen,et al.  DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing , 2021, ICLR.

[9]  Nan Duan,et al.  Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text , 2021, FINDINGS.

[10]  Liwei Wang,et al.  DAGN: Discourse-Aware Graph Network for Logical Reasoning , 2021, NAACL.

[11]  Sarah Stowell Datasets , 2021, Algebraic Analysis of Social Networks.

[12]  Soujanya Poria,et al.  Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering , 2021, ArXiv.

[13]  Peter Clark,et al.  ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language , 2020, FINDINGS.

[14]  Mohit Bansal,et al.  PRover: Proof Generation for Interpretable Reasoning over Rules , 2020, EMNLP.

[15]  Yue Zhang,et al.  LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning , 2020, IJCAI.

[16]  Jianfeng Gao,et al.  DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.

[17]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[18]  Zhiyuan Liu,et al.  Train No Evil: Selective Masking for Task-guided Pre-training , 2020, EMNLP.

[19]  Oyvind Tafjord,et al.  Transformers as Soft Reasoners over Language , 2020, IJCAI.

[20]  Jiashi Feng,et al.  ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning , 2020, ICLR.

[21]  Hossein Amirkhani,et al.  A Survey on Machine Reading Comprehension Systems , 2020, Natural Language Engineering.

[22]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[23]  Hao Tian,et al.  ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[24]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[25]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[26]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[27]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[28]  Samuel R. Bowman,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[29]  Johan Bos,et al.  Towards Universal Semantic Tagging , 2017, IWCS.

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[31]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[32]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[33]  Hai Zhao,et al.  Fact-driven Logical Reasoning , 2021, ArXiv.

[34]  Timothy Miller,et al.  EntityBERT: Entity-centric Masking Strategy for Model Pretraining for the Clinical Domain , 2021, BIONLP.

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[37]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[38]  OctoMiao Overcoming catastrophic forgetting in neural networks , 2016 .