ERNIE-NLI: Analyzing the Impact of Domain-Specific External Knowledge on Enhanced Representations for NLI

We examine the effect of domain-specific external knowledge variations on deep large scale language model performance. Recent work in enhancing BERT with external knowledge has been very popular, resulting in models such as ERNIE (Zhang et al., 2019a). Using the ERNIE architecture, we provide a detailed analysis on the types of knowledge that result in a performance increase on the Natural Language Inference (NLI) task, specifically on the Multi-Genre Natural Language Inference Corpus (MNLI). While ERNIE uses general TransE embeddings, we instead train domain-specific knowledge embeddings and insert this knowledge via an information fusion layer in the ERNIE architecture, allowing us to directly control and analyze knowledge input. Using several different knowledge training objectives, sources of knowledge, and knowledge ablations, we find a strong correlation between knowledge and classification labels within the same polarity, illustrating that knowledge polarity is an important feature in predicting entailment. We also perform classification change analysis across different knowledge variations to illustrate the importance of selecting appropriate knowledge input regarding content and polarity, and show representative examples of these changes.

[1]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[2]  SangKeun Lee,et al.  Dynamic Self-Attention : Computing Attention over Words Dynamically for Sentence Embedding , 2018, ArXiv.

[3]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[4]  Christopher Potts,et al.  A Fast Unified Model for Parsing and Sentence Understanding , 2016, ACL.

[5]  Hinrich Schutze,et al.  Negated LAMA: Birds cannot fly , 2019, ArXiv.

[6]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[7]  Allyson Ettinger,et al.  What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, TACL.

[8]  Jihun Choi,et al.  Learning to Compose Task-Specific Tree Structures , 2017, AAAI.

[9]  Roy Schwartz,et al.  Knowledge Enhanced Contextual Word Representations , 2019, EMNLP/IJCNLP.

[10]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[11]  Hao Tian,et al.  ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[12]  Jin-Hyuk Hong,et al.  Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information , 2018, AAAI.

[13]  Chengqi Zhang,et al.  Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling , 2018, IJCAI.

[14]  Yoav Shoham,et al.  SenseBERT: Driving Some Sense into BERT , 2019, ACL.

[15]  Christopher D. Manning,et al.  An extended model of natural logic , 2009, IWCS.

[16]  Zhiguo Wang,et al.  Bilateral Multi-Perspective Matching for Natural Language Sentences , 2017, IJCAI.

[17]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[18]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[19]  Jonathan Berant,et al.  oLMpics-On What Language Model Pre-training Captures , 2019, Transactions of the Association for Computational Linguistics.

[20]  Li Guo,et al.  Context-Dependent Knowledge Graph Embedding , 2015, EMNLP.

[21]  Yu Sun,et al.  ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.

[22]  Hai Zhao,et al.  Semantics-aware BERT for Language Understanding , 2020, AAAI.

[23]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[24]  Huanbo Luan,et al.  Modeling Relation Paths for Representation Learning of Knowledge Bases , 2015, EMNLP.

[25]  Li Guo,et al.  Semantically Smooth Knowledge Graph Embedding , 2015, ACL.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Adrian Iftene,et al.  Hypothesis Transformation and Semantic Variability Rules Used in Recognizing Textual Entailment , 2007, ACL-PASCAL@ACL.

[28]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[29]  Zhen-Hua Ling,et al.  Neural Natural Language Inference Models Enhanced with External Knowledge , 2017, ACL.

[30]  Fangzhao Wu,et al.  Logic Rules Powered Knowledge Graph Embedding , 2019, ArXiv.

[31]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[32]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[33]  Johan Bos,et al.  Recognising Textual Entailment with Logical Inference , 2005, HLT.

[34]  Sungzoon Cho,et al.  Distance-based Self-Attention Network for Natural Language Inference , 2017, ArXiv.

[35]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[36]  Rui Yan,et al.  Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.

[37]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[38]  Anna Korhonen,et al.  Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity , 2019, COLING.

[39]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[40]  Zhen Wang,et al.  Knowledge Graph and Text Jointly Embedding , 2014, EMNLP.

[41]  Florian Boudin,et al.  pke: an open source python-based keyphrase extraction toolkit , 2016, COLING.

[42]  Han Xiao,et al.  TransG : A Generative Model for Knowledge Graph Embedding , 2015, ACL.

[43]  Lawrence S. Moss,et al.  Probing Natural Language Inference Models through Semantic Fragments , 2020, AAAI.