Interactive Label Cleaning with Example-based Explanations

We tackle sequential learning under label noise in applications where a human supervisor can be queried to relabel suspicious examples. Existing approaches are flawed, in that they only relabel incoming examples that look “suspicious” to the model. As a consequence, those mislabeled examples that elude (or don’t undergo) this cleaning step end up tainting the training data and the model with no further chance of being cleaned. We propose CINCER, a novel approach that cleans both new and past data by identifying pairs of mutually incompatible examples. Whenever it detects a suspicious example, CINCER identifies a counter-example in the training set that—according to the model—is maximally incompatible with the suspicious example, and asks the annotator to relabel either or both examples, resolving this possible inconsistency. The counter-examples are chosen to be maximally incompatible, so to serve as explanations of the model’s suspicion, and highly influential, so to convey as much information as possible if relabeled. CINCER achieves this by leveraging an efficient and robust approximation of influence functions based on the Fisher information matrix (FIM). Our extensive empirical evaluation shows that clarifying the reasons behind the model’s suspicions by cleaning the counter-examples helps in acquiring substantially better data and models, especially when paired with our FIM approximation.

[1]  A. Alaszewski Using diaries for social research , 2006 .

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Percy Liang,et al.  On the Accuracy of Influence Functions for Measuring Group Effects , 2019, NeurIPS.

[4]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[6]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[7]  S. Feizi,et al.  Second-Order Group Influence Functions for Black-Box Predictions , 2019, ArXiv.

[8]  Shai Ben-David,et al.  New England , 1894, Letters from America.

[9]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[10]  Caiming Xiong,et al.  FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging , 2020, EMNLP.

[11]  Isaac L. Chuang,et al.  Confident Learning: Estimating Uncertainty in Dataset Labels , 2019, J. Artif. Intell. Res..

[12]  Samyadeep Basu,et al.  Influence Functions in Deep Learning Are Fragile , 2020, ICLR.

[13]  B. West,et al.  The Quality of Paradata: A Literature Review , 2013 .

[14]  Frederik Kunstner,et al.  Limitations of the empirical Fisher approximation for natural gradient descent , 2019, NeurIPS.

[15]  Kristian Kersting,et al.  Explanatory Interactive Machine Learning , 2019, AIES.

[16]  Naman Agarwal,et al.  Second-Order Stochastic Optimization for Machine Learning in Linear Time , 2016, J. Mach. Learn. Res..

[17]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[18]  Eric Brochu,et al.  Optimal Sub-sampling with Influence Functions , 2017, NeurIPS.

[19]  Xindong Wu,et al.  Learning from crowdsourced labeled data: a survey , 2016, Artificial Intelligence Review.

[20]  Gintare Karolina Dziugaite,et al.  RelatIF: Identifying Explanatory Training Examples via Relative Influence , 2020, ArXiv.

[21]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[22]  Jae-Gil Lee,et al.  Learning from Noisy Labels with Deep Neural Networks: A Survey , 2020, ArXiv.

[23]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[24]  Fausto Giunchiglia,et al.  Learning in the Wild with Incremental Skeptical Gaussian Processes , 2020, IJCAI.

[25]  Christian Igel,et al.  Robust Active Label Correction , 2018, AISTATS.

[26]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[27]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[28]  Xuan Zhao,et al.  “Influence sketching”: Finding influential samples in large-scale regressions , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[29]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[30]  Mani B. Srivastava,et al.  How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods , 2020, NeurIPS.

[31]  Stefano Teso,et al.  Machine Guides, Human Supervises: Interactive Learning with Global Explanations , 2020, ArXiv.

[32]  James Y. Zou,et al.  Data Shapley: Equitable Valuation of Data for Machine Learning , 2019, ICML.

[33]  K. Kersting,et al.  Making deep neural networks right for the right scientific reasons by interacting with their explanations , 2020, Nature Machine Intelligence.

[34]  Pradeep Ravikumar,et al.  Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.

[35]  Hongxia Jin,et al.  Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  M. Larsen,et al.  The Psychology of Survey Response , 2002 .

[37]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[38]  Fausto Giunchiglia,et al.  Fixing Mislabeling by Human Annotators Leveraging Conflict Resolution and Prior Knowledge , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[39]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[40]  Lucia Specia,et al.  Human-in-the-loop Debugging Deep Text Classifiers , 2020, EMNLP.

[41]  Oluwasanmi Koyejo,et al.  Interpreting Black Box Predictions using Fisher Kernels , 2018, AISTATS.

[42]  Tanima Dutta,et al.  Impact of Noisy Labels in Learning Techniques: A Survey , 2020 .