Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning

Though language model text embeddings have revolutionized NLP research, their ability to capture high-level semantic information, such as relations between entities in text, is limited. In this paper, we propose a novel contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure. Given a sentence (unstructured text) and its graph, we use contrastive learning to impose relation-related structure on the token level representations of the sentence obtained with a CharacterBERT (El Boukkouri et al., 2020) model. The resulting relation-aware sentence embeddings achieve state-of-the-art results on the relation extraction task using only a simple KNN classifier, thereby demonstrating the success of the proposed method. Additional visualization by a tSNE analysis shows the effectiveness of the learned representation space compared to baselines. Furthermore, we show that we can learn a different space for named entity recognition, again using a contrastive learning objective, and demonstrate how to successfully combine both representation spaces in an entity-relation task.

[1]  Pierre Zweigenbaum,et al.  CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters , 2020, COLING.

[2]  Jue Wang,et al.  Two Are Better than One: Joint Entity and Relation Extraction with Table-Sequence Encoders , 2020, EMNLP.

[3]  Christopher D. Manning,et al.  Contrastive Learning of Medical Visual Representations from Paired Images and Text , 2020, MLHC.

[4]  Fang Liu,et al.  Modeling Dense Cross-Modal Interactions for Joint Entity-Relation Extraction , 2020, IJCAI.

[5]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[6]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[7]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[8]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  A. Ulges,et al.  Span-based Joint Entity and Relation Extraction with Transformer Pre-training , 2019, ECAI.

[10]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[11]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[12]  Tung Tran,et al.  Neural Metric Learning for Fast End-to-End Relation Extraction , 2019, ArXiv.

[13]  Thomas Demeester,et al.  Adversarial training for multi-context joint entity and relation extraction , 2018, EMNLP.

[14]  Chris Develder,et al.  Joint entity recognition and relation extraction as a multi-head selection problem , 2018, Expert Syst. Appl..

[15]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[16]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[17]  Nathaniel J. Smith,et al.  Bootstrapping language acquisition , 2017, Cognition.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Fei Li,et al.  A neural joint model for entity and relation extraction from biomedical text , 2017, BMC Bioinformatics.

[20]  Yue Zhang,et al.  Joint Models for Extracting Adverse Drug Events from Biomedical Text , 2016, IJCAI.

[21]  Makoto Miwa,et al.  End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures , 2016, ACL.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  L. Rossetti A Simple Framework , 2015 .

[24]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Alessandro Moschitti,et al.  Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction , 2013, ACL.

[26]  Juliane Fluck,et al.  Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , 2012, J. Biomed. Informatics.

[27]  Ralph Grishman,et al.  Semi-supervised Relation Extraction with Large-scale Word Clustering , 2011, ACL.

[28]  Imed Zitouni,et al.  Improving Mention Detection Robustness to Noisy Input , 2010, EMNLP.

[29]  Steven Pinker,et al.  Language Learnability and Language Development, With New Commentary by the Author: With New Commentary by the Author , 2009 .

[30]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[31]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[32]  ChengXiang Zhai,et al.  A Systematic Exploration of the Feature Space for Relation Extraction , 2007, NAACL.

[33]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[34]  Imed Zitouni,et al.  Factorizing Complex Models: A Case Study in Mention Detection , 2006, ACL.

[35]  Ralph Grishman,et al.  Extracting Relations with Integrated Information Using Kernel Methods , 2005, ACL.

[36]  James R. Curran,et al.  Language Independent NER using a Maximum Entropy Tagger , 2003, CoNLL.

[37]  Erik F. Tjong Kim Sang,et al.  Representing Text Chunks , 1999, EACL.

[38]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .