Confidence Scores Make Instance-dependent Label-noise Learning Possible

Learning with noisy labels has drawn a lot of attention. In this area, most of recent works only consider class-conditional noise, where the label noise is independent of its input features. This noise model may not be faithful to many real-world applications. Instead, few pioneer works have studied instance-dependent noise, but these methods are limited to strong assumptions on noise models. To alleviate this issue, we introduce confidence-scored instance-dependent noise (CSIDN), where each instance-label pair is associated with a confidence score. The confidence scores are sufficient to estimate the noise functions of each instance with minimal assumptions. Moreover, such scores can be easily and cheaply derived during the construction of the dataset through crowdsourcing or automatic annotation. To handle CSIDN, we design a benchmark algorithm termed instance-level forward correction. Empirical results on synthetic and real-world datasets demonstrate the utility of our proposed method.

[1]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[2]  Gilles Blanchard,et al.  Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[3]  Jakramate Bootkrajang,et al.  Towards instance-dependent label noise-tolerant classification: a probabilistic approach , 2018, Pattern Analysis and Applications.

[4]  Frank Nielsen,et al.  On the Efficient Minimization of Classification Calibrated Surrogates , 2008, NIPS.

[5]  Gang Niu,et al.  Binary Classification from Positive-Confidence Data , 2017, NeurIPS.

[6]  Nagarajan Natarajan,et al.  Learning from Binary Labels with Instance-Dependent Corruption , 2016, ArXiv.

[7]  Pietro Perona,et al.  Lean Crowdsourcing: Combining Humans and Machines in an Online System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[9]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[10]  Frank Nielsen,et al.  Loss factorization, weakly supervised learning and label noise robustness , 2016, ICML.

[11]  Motoaki Kawanabe,et al.  Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.

[12]  Kotagiri Ramamohanarao,et al.  Learning with Bounded Instance- and Label-dependent Label Noise , 2017, ICML.

[13]  Chen Gong,et al.  Robust early-learning: Hindering the memorization of noisy labels , 2021, ICLR.

[14]  Geoffrey E. Hinton,et al.  Who Said What: Modeling Individual Labelers Improves Classification , 2017, AAAI.

[15]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[17]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[18]  Nuno Vasconcelos,et al.  On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost , 2008, NIPS.

[19]  Gang Niu,et al.  Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization , 2021, ICML.

[20]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[21]  Gang Niu,et al.  Class2Simi: A New Perspective on Learning with Label Noise , 2020, ArXiv.

[22]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Naresh Manwani,et al.  Noise Tolerance Under Risk Minimization , 2011, IEEE Transactions on Cybernetics.

[24]  Gang Niu,et al.  Searching to Exploit Memorization Effect in Learning with Noisy Labels , 2020, ICML.

[25]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[26]  Liva Ralaivola,et al.  Learning SVMs from Sloppily Labeled Data , 2009, ICANN.

[27]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[28]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[29]  Yang Liu,et al.  Learning with Instance-Dependent Label Noise: A Sample Sieve Approach , 2021, ICLR.

[30]  Aritra Ghosh,et al.  Making risk minimization tolerant to label noise , 2014, Neurocomputing.

[31]  Bert Huang,et al.  Adversarial Label Learning , 2018, AAAI.

[32]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[33]  Yanyao Shen,et al.  Learning with Bad Training Data via Iterative Trimmed Loss Minimization , 2018, ICML.

[34]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[35]  Bo An,et al.  Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[37]  Gang Niu,et al.  Provably End-to-end Label-Noise Learning without Anchor Points , 2021, ICML.

[38]  Xingrui Yu,et al.  SIGUA: Forgetting May Make Learning with Noisy Labels More Robust , 2018, ICML.

[39]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Mark W. Schmidt,et al.  Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.

[41]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[42]  Mark D. Reid,et al.  Composite Binary Losses , 2009, J. Mach. Learn. Res..

[43]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[44]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[45]  Masashi Sugiyama,et al.  On Symmetric Losses for Learning from Corrupted Labels , 2019, ICML.

[46]  Jun Du,et al.  Modelling Class Noise with Symmetric and Asymmetric Distributions , 2015, AAAI.

[47]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[48]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[49]  Clayton Scott,et al.  A Generalized Neyman-Pearson Criterion for Optimal Domain Adaptation , 2018, ALT.

[50]  Gang Niu,et al.  Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning , 2020, NeurIPS.

[51]  Gang Niu,et al.  Parts-dependent Label Noise: Towards Instance-dependent Label Noise , 2020, ArXiv.

[52]  Ivor W. Tsang,et al.  Masking: A New Perspective of Noisy Supervision , 2018, NeurIPS.

[53]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[54]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[55]  Anima Anandkumar,et al.  Learning From Noisy Singly-labeled Data , 2017, ICLR.

[56]  Robert C. Williamson,et al.  Learning in the Presence of Corruption , 2015, ArXiv.

[57]  Dana Angluin,et al.  Learning from noisy examples , 1988, Machine Learning.

[58]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[59]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[60]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[61]  Gang Niu,et al.  Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model , 2021, AAAI.

[62]  Nagarajan Natarajan,et al.  Learning from binary labels with instance-dependent noise , 2018, Machine Learning.

[63]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[64]  Yang Liu,et al.  A Second-Order Approach to Learning with Instance-Dependent Label Noise , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[67]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Cheng Soon Ong,et al.  Learning from Corrupted Binary Labels via Class-Probability Estimation , 2015, ICML.

[69]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[70]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.