Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction

Distantly supervised relation extraction is widely used to extract relational facts from text, but suffers from noisy labels. Current relation extraction methods try to alleviate the noise by multi-instance learning and by providing supporting linguistic and contextual information to more efficiently guide the relation classification. While achieving state-of-the-art results, we observed these models to be biased towards recognizing a limited set of relations with high precision, while ignoring those in the long tail. To address this gap, we utilize a pre-trained language model, the OpenAI Generative Pre-trained Transformer (GPT) [Radford et al., 2018]. The GPT and similar models have been shown to capture semantic and syntactic features, and also a notable amount of "common-sense" knowledge, which we hypothesize are important features for recognizing a more diverse set of relations. By extending the GPT to the distantly supervised setting, and fine-tuning it on the NYT10 dataset, we show that it predicts a larger set of distinct relation types with high confidence. Manual and automated evaluation of our model shows that it achieves a state-of-the-art AUC score of 0.422 on the NYT10 dataset, and performs especially well at higher recall levels.

[1]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[2]  Lukasz Kaiser,et al.  Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.

[3]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[4]  Ralph Grishman,et al.  Distant Supervision for Relation Extraction with an Incomplete Knowledge Base , 2013, NAACL.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Wei Zhang,et al.  SEE: Syntax-aware Entity Embedding for Neural Relation Extraction , 2018, AAAI.

[7]  Zhiyuan Liu,et al.  Hierarchical Relation Extraction with Coarse-to-Fine Grained Attention , 2018, EMNLP.

[8]  Leonhard Hennig,et al.  Improving Relation Extraction by Pre-trained Language Representations , 2019, AKBC.

[9]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[10]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[11]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[12]  Distant Supervision for Relation Extraction with Matrix Completion , 2014, ACL.

[13]  David Bamman,et al.  Adversarial Training for Relation Extraction , 2017, EMNLP.

[14]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[17]  Ramesh Nallapati,et al.  Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.

[18]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[19]  Wei Zhang,et al.  Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction , 2018, EMNLP.

[20]  Wei Zhang,et al.  Label-Free Distant Supervision for Relation Extraction via Knowledge Graph Embedding , 2018, EMNLP.

[21]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[22]  Bowen Zhou,et al.  Improved Neural Relation Detection for Knowledge Base Question Answering , 2017, ACL.

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[25]  Andrew McCallum,et al.  Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction , 2018, NAACL.

[26]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[28]  Zhifang Sui,et al.  A Soft-label Method for Noise-tolerant Distantly Supervised Relation Extraction , 2017, EMNLP.

[29]  Heike Adel,et al.  Noise Mitigation for Neural Entity Typing and Relation Extraction , 2016, EACL.

[30]  Dongyan Zhao,et al.  Semantic Relation Classification via Convolutional Neural Networks with Simple Negative Sampling , 2015, EMNLP.

[31]  Christopher D. Manning,et al.  Graph Convolution over Pruned Dependency Trees Improves Relation Extraction , 2018, EMNLP.

[32]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[35]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[36]  Chiranjib Bhattacharyya,et al.  RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information , 2018, EMNLP.

[37]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[38]  Dongyan Zhao,et al.  Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix , 2017, ACL.

[39]  Zhi Jin,et al.  Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths , 2015, EMNLP.

[40]  William Yang Wang,et al.  DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction , 2018, ACL.

[41]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[42]  Weijia Jia,et al.  Neural Relation Extraction via Inner-Sentence Noise Reduction and Transfer Learning , 2018, EMNLP.

[43]  Dong Wang,et al.  Relation Classification via Recurrent Neural Network , 2015, ArXiv.

[44]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.