Approximating Instance-Dependent Noise via Instance-Confidence Embedding

Label noise in multiclass classification is a major obstacle to the deployment of learning systems. However, unlike the widely used class-conditional noise (CCN) assumption that the noisy label is independent of the input feature given the true label, label noise in real-world datasets can be aleatory and heavily dependent on individual instances. In this work, we investigate the instance-dependent noise (IDN) model and propose an efficient approximation of IDN to capture the instance-specific label corruption. Concretely, noting the fact that most columns of the IDN transition matrix have only limited influence on the class-posterior estimation, we propose a variational approximation that uses a single-scalar confidence parameter. To cope with the situation where the mapping from the instance to its confidence value could vary significantly for two adjacent instances, we suggest using instance embedding that assigns a trainable parameter to each instance. The resulting instance-confidence embedding (ICE) method not only performs well under label noise but also can effectively detect ambiguous or mislabeled instances. We validate its utility on various image and text classification tasks.

[1]  Isaac L. Chuang,et al.  Confident Learning: Estimating Uncertainty in Dataset Labels , 2019, J. Artif. Intell. Res..

[2]  Fengmao Lv,et al.  Can Cross Entropy Loss Be Robust to Label Noise? , 2020, IJCAI.

[3]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[4]  David D. Cox,et al.  On the information bottleneck theory of deep learning , 2018, ICLR.

[5]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[6]  Byron Boots,et al.  Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks , 2020, NeurIPS.

[7]  Jeff A. Bilmes,et al.  Combating Label Noise in Deep Learning Using Abstention , 2019, ICML.

[8]  Ankit Singh Rawat,et al.  Can gradient clipping mitigate label noise? , 2020, ICLR.

[9]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.

[10]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[13]  Aditya Krishna Menon,et al.  Does label smoothing mitigate label noise? , 2020, ICML.

[14]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[15]  Le Song,et al.  Iterative Learning with Open-set Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[17]  Dimitris N. Metaxas,et al.  A Topological Filter for Learning with Label Noise , 2020, NeurIPS.

[18]  Alex Lamb,et al.  Deep Learning for Classical Japanese Literature , 2018, ArXiv.

[19]  Ran El-Yaniv,et al.  On the Foundations of Noise-free Selective Classification , 2010, J. Mach. Learn. Res..

[20]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[21]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[22]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[23]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[24]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[25]  Boaz Barak,et al.  Deep double descent: where bigger models and more data hurt , 2019, ICLR.

[26]  Junnan Li,et al.  DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[27]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[28]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Nagarajan Natarajan,et al.  Learning from binary labels with instance-dependent noise , 2018, Machine Learning.

[30]  Gang Niu,et al.  Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning , 2020, NeurIPS.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Aditya Krishna Menon,et al.  Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[33]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[34]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[35]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[36]  Yang Liu,et al.  Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates , 2019, ICML.

[37]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[38]  Rocco A. Servedio,et al.  Random classification noise defeats all convex potential boosters , 2008, ICML '08.

[39]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[40]  Gang Niu,et al.  Are Anchor Points Really Indispensable in Label-Noise Learning? , 2019, NeurIPS.

[41]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[42]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[43]  Kotagiri Ramamohanarao,et al.  Learning with Bounded Instance- and Label-dependent Label Noise , 2017, ICML.

[44]  Samuel R. Bowman,et al.  Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.

[45]  Baharan Mirzasoleiman,et al.  Coresets for Robust Training of Deep Neural Networks against Noisy Labels , 2020, NeurIPS.

[46]  Yueming Lyu,et al.  Curriculum Loss: Robust Learning and Generalization against Label Corruption , 2019, ICLR.

[47]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[48]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[49]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[50]  Sheng Liu,et al.  Early-Learning Regularization Prevents Memorization of Noisy Labels , 2020, NeurIPS.

[51]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[53]  Masashi Sugiyama,et al.  On Symmetric Losses for Learning from Corrupted Labels , 2019, ICML.

[54]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[56]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[57]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[59]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  James Bailey,et al.  Normalized Loss Functions for Deep Learning with Noisy Labels , 2020, ICML.

[61]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[63]  Thomas Brox,et al.  SELF: Learning to Filter Noisy Labels with Self-Ensembling , 2019, ICLR.

[64]  Peter A. Flach,et al.  Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration , 2019, NeurIPS.

[65]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[66]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[68]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[69]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[70]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[71]  James Bailey,et al.  Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[72]  Pheng-Ann Heng,et al.  Beyond Class-Conditional Assumption: A Primary Attempt to Combat Instance-Dependent Label Noise , 2020, AAAI.

[73]  Xingrui Yu,et al.  How does Disagreement Help Generalization against Label Corruption? , 2019, ICML.

[74]  Takanori Maehara,et al.  Data Cleansing for Models Trained with SGD , 2019, NeurIPS.

[75]  David Sontag,et al.  Consistent Estimators for Learning to Defer to an Expert , 2020, ICML.