CausalBERT: Injecting Causal Knowledge Into Pre-trained Models with Minimal Supervision

Recent work has shown success in incorporating pre-trained models like BERT to improve NLP systems. However, existing pre-trained models lack of causal knowledge which prevents today’s NLP systems from thinking like humans. In this paper, we investigate the problem of injecting causal knowledge into pre-trained models. There are two fundamental problems: 1) how to collect a largescale causal resource from unstructured texts; 2) how to effectively inject causal knowledge into pre-trained models. To address these issues, we propose CausalBERT, which collects the largest scale of causal resource using precise causal patterns and causal embedding techniques. In addition, we adopt a regularization-based method to preserve the already learned knowledge with an extra regularization term while injecting causal knowledge. Extensive experiments on 7 datasets, including four causal pair classification tasks, two causal QA tasks and a causal inference task, demonstrate that CausalBERT captures rich causal knowledge and outperforms all pretrained models-based state-of-the-art methods, achieving a new causal inference benchmark.

[1]  Yoav Shoham,et al.  SenseBERT: Driving Some Sense into BERT , 2019, ACL.

[2]  Dawn Song,et al.  Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[3]  Yejin Choi,et al.  Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning , 2019, EMNLP.

[4]  Xiao Ding,et al.  Guided Generation of Cause and Effect , 2020, IJCAI.

[5]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[6]  Wanxiang Che,et al.  Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting , 2020, EMNLP.

[7]  Dan Roth,et al.  Temporal Common Sense Acquisition with Minimal Supervision , 2020, ACL.

[8]  Dan I. Moldovan,et al.  Text Mining for Causal Relations , 2002, FLAIRS.

[9]  Michael Perrone,et al.  Answering Binary Causal Questions Through Large-Scale Text Mining: An Evaluation Using Cause-Effect Pairs from Human Experts , 2019, IJCAI.

[10]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[11]  Peter Jansen,et al.  Creating Causal Embeddings for Question Answering with Minimal Supervision , 2016, EMNLP.

[12]  Ting Liu,et al.  Constructing Narrative Event Evolutionary Graph for Script Event Prediction , 2018, IJCAI.

[13]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[14]  Samuel R. Bowman,et al.  Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.

[15]  Zhipeng Xie,et al.  Distributed Representation of Words in Cause and Effect Spaces , 2019, AAAI.

[16]  Xuanjing Huang,et al.  K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters , 2020, FINDINGS.

[17]  Sanda M. Harabagiu,et al.  UTDHLT: COPACETIC System for Choosing Plausible Alternatives , 2012, *SEMEVAL.

[18]  Roxana Gîrju,et al.  Automatic Detection of Causal Relations for Question Answering , 2003, ACL 2003.

[19]  Naoaki Okazaki,et al.  Handling Multiword Expressions in Causality Estimation , 2017, IWCS.

[20]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[21]  Roy Schwartz,et al.  Knowledge Enhanced Contextual Word Representations , 2019, EMNLP/IJCNLP.

[22]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[23]  Yejin Choi,et al.  Social IQA: Commonsense Reasoning about Social Interactions , 2019, EMNLP 2019.

[24]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[25]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[26]  Noah D. Goodman,et al.  DisSent: Learning Sentence Representations from Explicit Discourse Relations , 2019, ACL.

[27]  Shirin Sohrabi,et al.  An AI Planning Solution to Scenario Generation for Enterprise Risk Management , 2018, AAAI.

[28]  Zornitsa Kozareva,et al.  SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[29]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[30]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Cosmin Adrian Bejan,et al.  Commonsense Causal Reasoning Using Millions of Personal Stories , 2011, AAAI.

[32]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[33]  Ting Liu,et al.  Story Ending Prediction by Transferable BERT , 2019, IJCAI.

[34]  Jonathan Berant,et al.  Injecting Numerical Reasoning Skills into Language Models , 2020, ACL.

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  Yejin Choi,et al.  Event2Mind: Commonsense Inference on Events, Intents, and Reactions , 2018, ACL.

[37]  Shirin Sohrabi,et al.  IBM Scenario Planning Advisor: Plan Recognition as AI Planning in Practice , 2018, IJCAI.

[38]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[39]  Seung-won Hwang,et al.  Commonsense Causal Reasoning between Short Texts , 2016, KR.

[40]  Kentaro Inui,et al.  When Choosing Plausible Alternatives, Clever Hans can be Clever , 2019, EMNLP.

[41]  Benjamin Van Durme,et al.  Learning to Rank for Plausible Plausibility , 2019, ACL.

[42]  Kenneth Heafield,et al.  N-gram Counts and Language Models from the Common Crawl , 2014, LREC.

[43]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[44]  Paramita Mirza,et al.  Annotating Causality in the TempEval-3 Corpus , 2014, EACL 2014.