论文信息 - ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning

ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning

Pre-trained Language Models (PLMs) have shown strong performance in various downstream Natural Language Processing (NLP) tasks. However, PLMs still cannot well capture the factual knowledge in the text, which is crucial for understanding the whole text, especially for document-level language understanding tasks. To address this issue, we propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text. Specifically, (1) to better understand entities, we propose an entity discrimination task that distinguishes which tail entity can be inferred by the given head entity and relation. (2) Besides, to better understand relations, we employ a relation discrimination task which distinguishes whether two entity pairs are close or not in relational semantics. Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several documentlevel language understanding tasks, including relation extraction and reading comprehension, especially under low resource setting. Meanwhile, ERICA achieves comparable or better performance on sentence-level tasks. We will release the datasets, source codes and pretrained language models for further research explorations.

[1] Dirk Weissenborn,et al. Making Neural QA as Simple as Possible but not Simpler , 2017, CoNLL.

[2] Maosong Sun,et al. Coreferential Reasoning Learning for Language Representation , 2020, EMNLP.

[3] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[4] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[5] Sebastian Riedel,et al. Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.

[6] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[7] Jonathan Berant,et al. MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension , 2019, ACL.

[8] Dan Roth,et al. A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[9] Jun Zhao,et al. Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[10] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[11] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[13] Daniel Jurafsky,et al. Distant supervision for relation extraction without labeled data , 2009, ACL.

[14] Xuanjing Huang,et al. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters , 2020, FINDINGS.

[15] Maosong Sun,et al. DocRED: A Large-Scale Document-Level Relation Extraction Dataset , 2019, ACL.

[16] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17] Tianyu Gao,et al. KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation , 2019, ArXiv.

[18] Maosong Sun,et al. Learning from Context or Names? An Empirical Study on Neural Relation Extraction , 2020, EMNLP.

[19] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[20] Geoffrey E. Hinton,et al. Stochastic Neighbor Embedding , 2002, NIPS.

[21] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[22] Juliane Fluck,et al. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , 2012, J. Biomed. Informatics.

[23] Mohit Bansal,et al. Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA , 2019, ACL.

[24] Zhenyu Zhang,et al. HIN: Hierarchical Inference Network for Document-Level Relation Extraction , 2020, PAKDD.

[25] Wenhan Xiong,et al. Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model , 2019, ICLR.

[26] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.