DIAG-NRE: A Deep Pattern Diagnosis Framework for Distant Supervision Neural Relation Extraction

Modern neural network models have achieved the state-of-the-art performance on relation extraction (RE) tasks. Although distant supervision (DS) can automatically generate training labels for RE, the effectiveness of DS highly depends on datasets and relation types, and sometimes it may introduce large labeling noises. In this paper, we propose a deep pattern diagnosis framework, DIAG-NRE, that aims to diagnose and improve neural relation extraction (NRE) models trained on DS-generated data. DIAG-NRE includes three stages: (1) The deep pattern extraction stage employs reinforcement learning to extract regular-expression-style patterns from NRE models. (2) The pattern refinement stage builds a pattern hierarchy to find the most representative patterns and lets human reviewers evaluate them quantitatively by annotating a certain number of pattern-matched examples. In this way, we minimize both the number of labels to annotate and the difficulty of writing heuristic patterns. (3) The weak label fusion stage fuses multiple weak label sources, including DS and refined patterns, to produce noise-reduced labels that can train a better NRE model. To demonstrate the broad applicability of DIAG-NRE, we use it to diagnose 14 relation types of two public datasets with one simple hyper-parameter configuration. We observe different noise behaviors and obtain significant F1 improvements on all relation types suffering from large labeling noises.

[1]  Xiang Zhang,et al.  Automatically Labeled Data Generation for Large Scale Event Extraction , 2017, ACL.

[2]  Heng Ji,et al.  Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach , 2017, EMNLP.

[3]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[4]  William Yang Wang,et al.  Robust Distant Supervision Relation Extraction via Deep Reinforcement Learning , 2018, ACL.

[5]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[6]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[7]  Dongyan Zhao,et al.  Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix , 2017, ACL.

[8]  Zhi Jin,et al.  Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths , 2015, EMNLP.

[9]  Dmitry Zelenko,et al.  Kernel methods for relation extraction , 2003 .

[10]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[11]  Hiroshi Nakagawa,et al.  Reducing Wrong Labels in Distant Supervision for Relation Extraction , 2012, ACL.

[12]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[13]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[14]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[15]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[16]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[17]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[18]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[19]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[20]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[21]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[22]  Christopher D. Manning,et al.  Combining Distant and Partial Supervision for Relation Extraction , 2014, EMNLP.

[23]  Christopher Ré,et al.  Learning the Structure of Generative Models without Labeled Data , 2017, ICML.

[24]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[25]  Christopher Ré,et al.  Big Data versus the Crowd: Looking for Relationships in All the Right Places , 2012, ACL.

[26]  Samyam Rajbhandari,et al.  LONG SHORT-TERM MEMORY , 2018 .

[27]  David Bamman,et al.  Adversarial Training for Relation Extraction , 2017, EMNLP.

[28]  Zhiyuan Liu,et al.  Relation Classification via Multi-Level Attention CNNs , 2016, ACL.

[29]  Li Zhao,et al.  Reinforcement Learning for Relation Classification From Noisy Data , 2018, AAAI.

[30]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[31]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[33]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[34]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[35]  Ramesh Nallapati,et al.  Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.

[36]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[37]  Zhiyuan Liu,et al.  Denoising Distantly Supervised Open-Domain Question Answering , 2018, ACL.

[38]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[39]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[40]  Angli Liu,et al.  Effective Crowd Annotation for Relation Extraction , 2016, NAACL.

[41]  Ido Dagan,et al.  Creating a Large Benchmark for Open Information Extraction , 2016, EMNLP.

[42]  Jiawei Han,et al.  MetaPAD: Meta Pattern Discovery from Massive Text Corpora , 2017, KDD.

[43]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[44]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[45]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[46]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[47]  Makoto Miwa,et al.  End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures , 2016, ACL.

[48]  Bowen Zhou,et al.  Classifying Relations by Ranking with Convolutional Neural Networks , 2015, ACL.

[49]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[50]  Li Zhao,et al.  Learning Structured Representation for Text Classification via Reinforcement Learning , 2018, AAAI.

[51]  Ralph Grishman,et al.  Infusion of Labeled Data into Distant Supervision for Relation Extraction , 2014, ACL.

[52]  Ming Zhou,et al.  Neural Open Information Extraction , 2018, ACL.

[53]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[54]  Christopher De Sa,et al.  Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data , 2016, 1610.08123.

[55]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..