Learning from Context or Names? An Empirical Study on Neural Relation Extraction

Neural models have achieved remarkable success on relation extraction (RE) benchmarks. However, there is no clear understanding which type of information affects existing RE models to make decisions and how to further improve the performance of these models. To this end, we empirically study the effect of two main information sources in text: textual context and entity mentions (names). We find that (i) while context is the main source to support the predictions, RE models also heavily rely on the information from entity mentions, most of which is type information, and (ii) existing datasets may leak shallow heuristics via entity mentions and thus contribute to the high performance on RE benchmarks. Based on the analyses, we propose an entity-masked contrastive pre-training framework for RE to gain a deeper understanding on both textual context and type information while avoiding rote memorization of entities or use of superficial cues in mentions. We carry out extensive experiments to support our views, and show that our framework can improve the effectiveness and robustness of neural models in different RE scenarios. All the code and datasets are released at this https URL.

[1]  Dan Roth,et al.  A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[2]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[3]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[4]  Scott B. Huffman,et al.  Learning information extraction patterns from examples , 1995, Learning for Natural Language Processing.

[5]  Dan Roth,et al.  Probabilistic Reasoning for Entity & Relation Recognition , 2002, COLING.

[6]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[7]  Wanxiang Che,et al.  Convolution Neural Network for Relation Extraction , 2013, ADMA.

[8]  Aleksandra Gabryszak,et al.  TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task , 2020, ACL.

[9]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[10]  Ralph Grishman,et al.  Distant Supervision for Relation Extraction with an Incomplete Knowledge Base , 2013, NAACL.

[11]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[12]  Zhiyuan Liu,et al.  FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation , 2018, EMNLP.

[13]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[17]  Zhe Zhao,et al.  K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.

[18]  Pascale Fung,et al.  Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems , 2018, ACL.

[19]  James P. Callan,et al.  Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding , 2017, WWW.

[20]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[21]  Tudor I. Oprea,et al.  ChemProt-3.0: a global chemical biology diseases mapping , 2016, Database J. Biol. Databases Curation.

[22]  Jie Zhou,et al.  More Data, More Relations, More Context and More Openness: A Review and Outlook for Relation Extraction , 2020, AACL/IJCNLP.

[23]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[24]  Jeffrey Ling,et al.  Matching the Blanks: Distributional Similarity for Relation Learning , 2019, ACL.

[25]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[26]  Dong Wang,et al.  Relation Classification via Recurrent Neural Network , 2015, ArXiv.

[27]  Ralph Grishman,et al.  Relation Extraction: Perspective from Convolutional Neural Networks , 2015, VS@HLT-NAACL.

[28]  Danqi Chen,et al.  Position-aware Attention and Supervised Data Improve Slot Filling , 2017, EMNLP.

[29]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[30]  Maosong Sun,et al.  FewRel 2.0: Towards More Challenging Few-Shot Relation Classification , 2019, EMNLP.

[31]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[32]  Nanda Kambhatla,et al.  Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations , 2004, ACL 2004.

[33]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[34]  Maosong Sun,et al.  OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction , 2019, EMNLP.

[35]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[36]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[37]  Roy Schwartz,et al.  Knowledge Enhanced Contextual Word Representations , 2019, EMNLP/IJCNLP.

[38]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[39]  Jason Weston,et al.  Question Answering with Subgraph Embeddings , 2014, EMNLP.

[40]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.