Finding Influential Instances for Distantly Supervised Relation Extraction

Distant supervision has been demonstrated to be highly beneficial to enhance relation extraction models, but it often suffers from high label noise. In this work, we propose a novel model-agnostic instance subsampling method for distantly supervised relation extraction, namely REIF, which bridges the gap of realizing influence subsampling in deep learning. It encompasses two key steps: first calculating instance-level influences that measure how much each training instance contributes to the validation loss change of our model, then deriving sampling probabilities via the proposed sigmoid sampling function to perform batch-in-bag sampling. We design a fast influence subsampling scheme that reduces the computational complexity from O(mn) to O(1), and analyze its robustness when the sigmoid sampling function is employed. Empirical experiments demonstrate our method's superiority over the baselines, and its ability to support interpretable instance selection.

[1]  Min Zhang,et al.  Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning , 2018, COLING.

[2]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[3]  Ramesh Nallapati,et al.  Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.

[4]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[5]  Jun Zhao,et al.  Large Scaled Relation Extraction With Reinforcement Learning , 2018, AAAI.

[6]  Jun Zhao,et al.  Distant Supervision for Relation Extraction with Sentence-Level Attention and Entity Descriptions , 2017, AAAI.

[7]  M. de Rijke,et al.  Finding Influential Training Samples for Gradient Boosted Decision Trees , 2018, ICML.

[8]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[9]  Zhiyuan Liu,et al.  Denoising Distant Supervision for Relation Extraction via Instance-Level Adversarial Training , 2018, ArXiv.

[10]  William Yang Wang,et al.  Robust Distant Supervision Relation Extraction via Deep Reinforcement Learning , 2018, ACL.

[11]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[12]  Liyuan Liu,et al.  Cross-relation Cross-bag Attention for Distantly-supervised Relation Extraction , 2018, AAAI.

[13]  Hiroshi Nakagawa,et al.  Reducing Wrong Labels in Distant Supervision for Relation Extraction , 2012, ACL.

[14]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[15]  Peng Zhou,et al.  Distant supervision for relation extraction with hierarchical selective attention , 2018, Neural Networks.

[16]  Hong Zhu,et al.  Less Is Better: Unweighted Data Subsampling via Influence Function , 2019, AAAI.

[17]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[18]  David Bamman,et al.  Adversarial Training for Relation Extraction , 2017, EMNLP.

[19]  William Yang Wang,et al.  DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction , 2018, ACL.

[20]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[21]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[22]  Bo Li,et al.  Data Dropout: Optimizing Training Data for Convolutional Neural Networks , 2018, 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).

[23]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[24]  Xinyan Xiao,et al.  ARNOR: Attention Regularization based Noise Reduction for Distant Supervision Relation Classification , 2019, ACL.

[25]  Li Zhao,et al.  Reinforcement Learning for Relation Classification From Noisy Data , 2018, AAAI.

[26]  Zhifang Sui,et al.  A Soft-label Method for Noise-tolerant Distantly Supervised Relation Extraction , 2017, EMNLP.