A Crowdsourcing Based Human-in-the-Loop Framework for Denoising UUs in Relation Extraction Tasks

In relation extraction tasks, distant supervision methods expand dataset by aligning entity pairs in different knowledge bases and completing the relations between two entities. However, these methods ignore the fact that sentences labels generated by distant supervision methods with high confidence are often incorrect in the real world called Unknown Unknowns (UUs). To deal with this challenge, we propose a crowdsourcing based human-in-the-loop denoising framework which iteratively discovers UUs and corrects them by crowdsourcing to better extract relations. During each epoch of iterations, we choose one sentence bag and repeat two steps: Firstly, attention based Long Short-Term Memory network is applied as a selector to discover potential UUs. Secondly, these UUs are annotated by crowdsourcing with two answer collecting strategies and fed back into selector as positive samples. Until the accuracy of selector reaches a threshold, all annotated samples are added into relation classifier as cleaned train set and framework moves on to next epoch with new sentence bags. The experiments on the New York Times dataset and analysis of potential UUs demonstrate that our framework denoise the dataset and outperforms all the baselines on distant supervision relation extraction tasks.

[1]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[2]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[3]  Angli Liu,et al.  Effective Crowd Annotation for Relation Extraction , 2016, NAACL.

[4]  William Yang Wang,et al.  Robust Distant Supervision Relation Extraction via Deep Reinforcement Learning , 2018, ACL.

[5]  Zhifang Sui,et al.  A Soft-label Method for Noise-tolerant Distantly Supervised Relation Extraction , 2017, EMNLP.

[6]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[7]  Bowen Zhou,et al.  Classifying Relations by Ranking with Convolutional Neural Networks , 2015, ACL.

[8]  Christopher Ré,et al.  Big Data versus the Crowd: Looking for Relationships in All the Right Places , 2012, ACL.

[9]  Zhi Jin,et al.  Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths , 2015, EMNLP.

[10]  William Yang Wang,et al.  DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction , 2018, ACL.

[11]  David Bamman,et al.  Adversarial Training for Relation Extraction , 2017, EMNLP.

[12]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[13]  Li Zhao,et al.  Reinforcement Learning for Relation Classification From Noisy Data , 2018, AAAI.

[14]  Houfeng Wang,et al.  Bidirectional Recurrent Convolutional Neural Network for Relation Classification , 2016, ACL.

[15]  Deepak Agarwal,et al.  Detecting anomalies in cross-classified streams: a Bayesian approach , 2006, Knowledge and Information Systems.

[16]  Bai Wang,et al.  Distant Supervision for Relation Extraction with Hierarchical Attention and Entity Descriptions , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[17]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[18]  Panagiotis G. Ipeirotis,et al.  Beat the Machine: Challenging Humans to Find a Predictive Model's “Unknown Unknowns” , 2015, JDIQ.

[19]  Nanda Kambhatla,et al.  Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction , 2004, ACL.

[20]  Eric Horvitz,et al.  Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration , 2016, AAAI.

[21]  Claudio Giuliano,et al.  FBK-IRST: Kernel Methods for Semantic Relation Extraction , 2007, SemEval@ACL.

[22]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[23]  Jian Su,et al.  Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel , 2006, NAACL.

[24]  Christopher D. Manning,et al.  Combining Distant and Partial Supervision for Relation Extraction , 2014, EMNLP.

[25]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[26]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[27]  Jun Zhao,et al.  Distant Supervision for Relation Extraction with Sentence-Level Attention and Entity Descriptions , 2017, AAAI.

[28]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.