Robust early-learning: Hindering the memorization of noisy labels

The memorization effects of deep networks show that they will first memorize training data with clean labels and then those with noisy labels. The early stopping method therefore can be exploited for learning with noisy labels. However, the side effect brought by noisy labels will influence the memorization of clean labels before early stopping. In this paper, motivated by the lottery ticket hypothesis which shows that only partial parameters are important for generalization, we find that only partial parameters are important for fitting clean labels and generalize well, which we term as critical parameters; while the other parameters tend to fit noisy labels and cannot generalize well, which we term as non-critical parameters. Based on this, we propose robust early-learning to reduce the side effect of noisy labels before early stopping and thus enhance the memorization of clean labels. Specifically, in each iteration, we divide all parameters into the critical and non-critical ones, and then perform different update rules for different types of parameters. Extensive experiments on benchmark-simulated and real-world label-noise datasets demonstrate the superiority of the proposed method over the state-of-the-art label-noise learning methods.

[1]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[2]  Yuanzhi Li,et al.  Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[3]  Junnan Li,et al.  DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[4]  Yuan Cao,et al.  Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.

[5]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[6]  Wei Li,et al.  WebVision Database: Visual Learning and Understanding from Web Data , 2017, ArXiv.

[7]  Sheng Liu,et al.  Early-Learning Regularization Prevents Memorization of Noisy Labels , 2020, NeurIPS.

[8]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[10]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[11]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[12]  Yueming Lyu,et al.  Curriculum Loss: Robust Learning and Generalization against Label Corruption , 2019, ICLR.

[13]  Gang Niu,et al.  Class2Simi: A New Perspective on Learning with Label Noise , 2020, ArXiv.

[14]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[15]  Deyu Meng,et al.  Meta Transition Adaptation for Robust Deep Learning with Noisy Labels , 2020, ArXiv.

[16]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Kilian Q. Weinberger,et al.  Identifying Mislabeled Data using the Area Under the Margin Ranking , 2020, NeurIPS.

[19]  Gang Niu,et al.  Are Anchor Points Really Indispensable in Label-Noise Learning? , 2019, NeurIPS.

[20]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[21]  Gang Niu,et al.  Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model , 2021, AAAI.

[22]  Nannan Wang,et al.  Extended $T$T: Learning With Mixed Closed-Set and Open-Set Noisy Labels , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Jeff A. Bilmes,et al.  Combating Label Noise in Deep Learning Using Abstention , 2019, ICML.

[25]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[26]  Yang Liu,et al.  A Second-Order Approach to Learning with Instance-Dependent Label Noise , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Dacheng Tao,et al.  An Efficient and Provable Approach for Mixture Proportion Estimation Using Linear Independence Assumption , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Pengfei Chen,et al.  Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels , 2019, ICML.

[29]  Thomas Brox,et al.  SELF: Learning to Filter Noisy Labels with Self-Ensembling , 2019, ICLR.

[30]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[31]  Kotagiri Ramamohanarao,et al.  Learning with Bounded Instance- and Label-dependent Label Noise , 2017, ICML.

[32]  Dacheng Tao,et al.  Learning with Biased Complementary Labels , 2017, ECCV.

[33]  Lutz Prechelt,et al.  Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.

[34]  James Bailey,et al.  Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Yuan Cao,et al.  Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.

[36]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[37]  Hongxia Yang,et al.  Learning with Group Noise , 2021, AAAI.

[38]  Nir Shavit,et al.  Deep Learning is Robust to Massive Label Noise , 2017, ArXiv.

[39]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[40]  Xingrui Yu,et al.  SIGUA: Forgetting May Make Learning with Noisy Labels More Robust , 2018, ICML.

[41]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Yang Liu,et al.  Learning with Instance-Dependent Label Noise: A Sample Sieve Approach , 2021, ICLR.

[43]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[44]  Zhiyuan Li,et al.  Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee , 2019, ICLR.

[45]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[46]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[47]  Kevin Gimpel,et al.  Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[48]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[49]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[50]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[51]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[52]  Aram Galstyan,et al.  Improving Generalization by Controlling Label-Noise Information in Neural Network Weights , 2020, ICML.

[53]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[55]  Yizhou Wang,et al.  L_DMI: A Novel Information-theoretic Loss Function for Training Deep Nets Robust to Label Noise , 2019, NeurIPS.

[56]  Samet Oymak,et al.  Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks , 2019, AISTATS.

[57]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[58]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[59]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[60]  Xingrui Yu,et al.  How Does Disagreement Benefit Co-teaching? , 2019 .

[61]  Gang Niu,et al.  Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning , 2020, NeurIPS.

[62]  Gang Niu,et al.  Parts-dependent Label Noise: Towards Instance-dependent Label Noise , 2020, ArXiv.

[63]  Dimitris N. Metaxas,et al.  Error-Bounded Correction of Noisy Labels , 2020, ICML.

[64]  Ivor W. Tsang,et al.  Masking: A New Perspective of Noisy Supervision , 2018, NeurIPS.

[65]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Gang Niu,et al.  Searching to Exploit Memorization Effect in Learning with Noisy Labels , 2020, ICML.

[67]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[68]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  James Bailey,et al.  Normalized Loss Functions for Deep Learning with Noisy Labels , 2020, ICML.

[70]  Gang Niu,et al.  Provably End-to-end Label-Noise Learning without Anchor Points , 2021, ICML.

[71]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.