Improving Relation Extraction by Pre-trained Language Representations

Current state-of-the-art relation extraction methods typically rely on a set of lexical, syntactic, and semantic features, explicitly computed in a pre-processing step. Training feature extraction models requires additional annotated language resources, which severely restricts the applicability and portability of relation extraction to novel languages. Similarly, pre-processing introduces an additional source of error. To address these limitations, we introduce TRE, a Transformer for Relation Extraction, extending the OpenAI Generative Pre-trained Transformer [Radford et al., 2018]. Unlike previous relation extraction models, TRE uses pre-trained deep language representations instead of explicit linguistic features to inform the relation classification and combines it with the self-attentive Transformer architecture to effectively model long-range dependencies between entity mentions. TRE allows us to learn implicit linguistic features solely from plain text corpora by unsupervised pre-training, before fine-tuning the learned language representations on the relation extraction task. TRE obtains a new state-of-the-art result on the TACRED and SemEval 2010 Task 8 datasets, achieving a test F1 of 67.4 and 87.1, respectively. Furthermore, we observe a significant increase in sample efficiency. With only 20% of the training examples, TRE matches the performance of our baselines and our model trained from scratch on 100% of the TACRED dataset. We open-source our trained models, experiments, and source code.

[1]  Danqi Chen,et al.  Position-aware Attention and Supervised Data Improve Slot Filling , 2017, EMNLP.

[2]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[3]  Sanda M. Harabagiu,et al.  UTD: Classifying Semantic Relations by Combining Lexical and Semantic Resources , 2010, *SEMEVAL.

[4]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[5]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[6]  Andrew McCallum,et al.  Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction , 2018, NAACL.

[7]  Zhiyuan Liu,et al.  Hierarchical Relation Extraction with Coarse-to-Fine Grained Attention , 2018, EMNLP.

[8]  Christopher D. Manning,et al.  Graph Convolution over Pruned Dependency Trees Improves Relation Extraction , 2018, EMNLP.

[9]  Dongyan Zhao,et al.  Semantic Relation Classification via Convolutional Neural Networks with Simple Negative Sampling , 2015, EMNLP.

[10]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[11]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[12]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[13]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[14]  Lukasz Kaiser,et al.  Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.

[15]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[16]  Bowen Zhou,et al.  Improved Neural Relation Detection for Knowledge Base Question Answering , 2017, ACL.

[17]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[18]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[19]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[21]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[22]  Houfeng Wang,et al.  Bidirectional Recurrent Convolutional Neural Network for Relation Classification , 2016, ACL.

[23]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[24]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[25]  Dong Wang,et al.  Relation Classification via Recurrent Neural Network , 2015, ArXiv.

[26]  Preslav Nakov,et al.  SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals , 2009, SEW@NAACL-HLT.

[27]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[28]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[29]  Zhi Jin,et al.  Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths , 2015, EMNLP.

[30]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[31]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[32]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[33]  Zhi Jin,et al.  Improved relation classification by deep recurrent neural networks with data augmentation , 2016, COLING.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.