Scalpel-CD: Leveraging Crowdsourcing and Deep Probabilistic Modeling for Debugging Noisy Training Data

This paper presents Scalpel-CD, a first-of-its-kind system that leverages both human and machine intelligence to debug noisy labels from the training data of machine learning systems. Our system identifies potentially wrong labels using a deep probabilistic model, which is able to infer the latent class of a high-dimensional data instance by exploiting data distributions in the underlying latent feature space. To minimize crowd efforts, it employs a data sampler which selects data instances that would benefit the most from being inspected by the crowd. The manually verified labels are then propagated to similar data instances in the original training data by exploiting the underlying data structure, thus scaling out the contribution from the crowd. Scalpel-CD is designed with a set of algorithmic solutions to automatically search for the optimal configurations for different types of training data, in terms of the underlying data structure, noise ratio, and noise types (random vs. structural). In a real deployment on multiple machine learning tasks, we demonstrate that Scalpel-CD is able to improve label quality by 12.9% with only 2.8% instances inspected by the crowd.

[1]  Enrique Alfonseca,et al.  Pattern Learning for Relation Extraction with a Hierarchical Topic Model , 2012, ACL.

[2]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[3]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[4]  Mark W. Schmidt,et al.  Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.

[5]  Matthew Lease,et al.  The Many Benefits of Annotator Rationales for Relevance Judgments , 2017, IJCAI.

[6]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[7]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[8]  Xindong Wu,et al.  Self-Taught Active Learning from Crowds , 2012, 2012 IEEE 12th International Conference on Data Mining.

[9]  François Bry,et al.  Human computation , 2018, it Inf. Technol..

[10]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[11]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[12]  Heng Ji,et al.  CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases , 2016, WWW.

[13]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[14]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[15]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[16]  Hiroshi Nakagawa,et al.  Reducing Wrong Labels in Distant Supervision for Relation Extraction , 2012, ACL.

[17]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[18]  Matthew Lease,et al.  Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments , 2016, HCOMP.

[19]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[20]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[21]  Sriram K. Rajamani,et al.  Debugging Machine Learning Tasks , 2016, ArXiv.

[22]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[23]  Luis von Ahn Human Computation , 2008, ICDE.

[24]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[25]  Noah A. Smith,et al.  Rating Computer-Generated Questions with Mechanical Turk , 2010, Mturk@HLT-NAACL.

[26]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[27]  Allan Jabri,et al.  Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.

[28]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[29]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[30]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[31]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[32]  Brendan T. O'Connor,et al.  Learning to Extract Events from Knowledge Base Revisions , 2017, WWW.

[33]  Jennifer Wortman Vaughan Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research , 2017, J. Mach. Learn. Res..

[34]  Marta R. Costa-jussà,et al.  Opinion Mining of Spanish Customer Comments with Non-Expert Annotations on Mechanical Turk , 2010, Mturk@HLT-NAACL.

[35]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[36]  Ahmed K. Elmagarmid,et al.  Guided data repair , 2011, Proc. VLDB Endow..

[37]  Jie Yang,et al.  Leveraging Crowdsourcing Data for Deep Active Learning An Application: Learning Intents in Alexa , 2018, WWW.

[38]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[39]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[40]  Tom M. Mitchell,et al.  Weakly Supervised Extraction of Computer Security Events from Twitter , 2015, WWW.

[41]  Sanjay Krishnan,et al.  ActiveClean: Interactive Data Cleaning For Statistical Modeling , 2016, Proc. VLDB Endow..

[42]  Ahmed K. Elmagarmid,et al.  Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes , 2013, SIGMOD '13.

[43]  Hongwei Li,et al.  Error Rate Analysis of Labeling by Crowdsourcing , 2013 .

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Yuandong Tian,et al.  Learning from crowds in the presence of schools of thought , 2012, KDD.

[46]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[47]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[48]  Jennifer G. Dy,et al.  Active Learning from Crowds , 2011, ICML.

[49]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[50]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[51]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.