Noisy Labels Can Induce Good Representations

The current success of deep learning depends on large-scale labeled datasets. In practice, high-quality annotations are expensive to collect, but noisy annotations are more affordable. Previous works report mixed empirical results when training with noisy labels: neural networks can easily memorize random labels, but they can also generalize after training with noisy labels. To explain this puzzle, we study how architecture affects learning with noisy labels. We observe that if an architecture “suits” the task, training with noisy labels can induce useful hidden representations, even when the model generalizes poorly; i.e., the last few layers of the model are more negatively affected by noisy labels. This finding leads to a simple method to improve models trained on noisy labels: replacing the final dense layers with a linear model, whose weights are learned from a small set of clean data. We empirically validate our findings across three architectures (Convolutional Neural Networks, Graph Neural Networks, and Multi-Layer Perceptrons) and two domains (graph algorithmic tasks and image classification). Furthermore, we achieve state-of-the-art results on image classification benchmarks by combining our method with existing approaches on noisy label training.

[1]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[2]  Dacheng Tao,et al.  Multiclass Learning With Partially Corrupted Labels , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Zhiyuan Li,et al.  Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee , 2019, ICLR.

[5]  Nir Shavit,et al.  Deep Learning is Robust to Massive Label Noise , 2017, ArXiv.

[6]  Lei Zhang,et al.  CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Ken-ichi Kawarabayashi,et al.  What Can Neural Networks Reason About? , 2019, ICLR.

[8]  Saso Dzeroski,et al.  Noise detection and elimination in data preprocessing: Experiments in medical domains , 2000, Appl. Artif. Intell..

[9]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[10]  Geoffrey E. Hinton,et al.  Learning to Label Aerial Images from Noisy Data , 2012, ICML.

[11]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[13]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[14]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[15]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Hisashi Kashima,et al.  Approximation Ratios of Graph Neural Networks for Combinatorial Problems , 2019, NeurIPS.

[17]  Naresh Manwani,et al.  Noise Tolerance Under Risk Minimization , 2011, IEEE Transactions on Cybernetics.

[18]  Vitaly Feldman,et al.  What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation , 2020, NeurIPS.

[19]  Kimin Lee,et al.  Using Pre-Training Can Improve Model Robustness and Uncertainty , 2019, ICML.

[20]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[21]  Ruosong Wang,et al.  Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels , 2019, NeurIPS.

[22]  Alessandro Laio,et al.  Intrinsic dimension of data representations in deep neural networks , 2019, NeurIPS.

[23]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[24]  James Bailey,et al.  Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Marc-André Weber,et al.  Comparative Validation of Graphical Models for Learning Tumor Segmentations from Noisy Manual Annotations , 2010, MCV.

[26]  Yanyao Shen,et al.  Learning with Bad Training Data via Iterative Trimmed Loss Minimization , 2018, ICML.

[27]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[28]  Guillermo Sapiro,et al.  DNN or k-NN: That is the Generalize vs. Memorize Question , 2018, ArXiv.

[29]  Liwei Wang,et al.  GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training , 2020, ArXiv.

[30]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[31]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[32]  Junnan Li,et al.  DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[33]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[34]  Sueli I. Rodrigues Costa,et al.  Fisher information distance: a geometrical reading? , 2012, Discret. Appl. Math..

[35]  Jacob Goldberger,et al.  Training deep neural-networks based on unreliable labels , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Taiji Suzuki,et al.  Understanding Generalization in Deep Learning via Tensor Methods , 2020, AISTATS.

[37]  Sheng Liu,et al.  Early-Learning Regularization Prevents Memorization of Noisy Labels , 2020, NeurIPS.

[38]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[39]  Honglak Lee,et al.  Distilling Effective Supervision From Severe Label Noise , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Aditya Krishna Menon,et al.  Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[41]  Yueming Lyu,et al.  Curriculum Loss: Robust Learning and Generalization against Label Corruption , 2019, ICLR.

[42]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.

[43]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[44]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[45]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[46]  Kevin Gimpel,et al.  Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[47]  Andreas Loukas,et al.  How hard is to distinguish graphs with graph neural networks? , 2020, NeurIPS.

[48]  Ken-ichi Kawarabayashi,et al.  How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks , 2020, ArXiv.

[49]  Wei Li,et al.  WebVision Database: Visual Learning and Understanding from Web Data , 2017, ArXiv.

[50]  Dacheng Tao,et al.  Learning with Biased Complementary Labels , 2017, ECCV.

[51]  Subramanian Ramanathan,et al.  Learning from multiple annotators with varying expertise , 2013, Machine Learning.

[52]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[53]  Jaap Kamps,et al.  Learning to Learn from Weak Supervision by Full Supervision , 2017, ArXiv.

[54]  Xingrui Yu,et al.  How does Disagreement Help Generalization against Label Corruption? , 2019, ICML.

[55]  Le Song,et al.  Iterative Learning with Open-set Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Andrew McCallum,et al.  Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples , 2017, NIPS.

[57]  Carla E. Brodley,et al.  Class Noise Mitigation Through Instance Weighting , 2007, ECML.

[58]  Matthew S. Nokleby,et al.  Learning Deep Networks from Noisy Labels with Dropout Regularization , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[59]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[60]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Noise detection in the meta-learning level , 2016, Neurocomputing.

[61]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[62]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[63]  Pengfei Chen,et al.  Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels , 2019, ICML.

[64]  Aritra Ghosh,et al.  On the Robustness of Decision Tree Learning Under Label Noise , 2017, PAKDD.

[65]  Joaquín Abellán,et al.  Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data , 2014, Expert Syst. Appl..

[66]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[67]  Daniel Keysers,et al.  What Do Neural Networks Learn When Trained With Random Labels? , 2020, NeurIPS.

[68]  Thomas Brox,et al.  SELF: Learning to Filter Noisy Labels with Self-Ensembling , 2019, ICLR.

[69]  Ivor W. Tsang,et al.  Robust Semi-Supervised Learning through Label Aggregation , 2016, AAAI.

[70]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[71]  Paolo Favaro,et al.  Deep Bilevel Learning , 2018, ECCV.

[72]  Di Huang,et al.  Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels , 2020, ICML.

[73]  Brian Mac Namee,et al.  Profiling instances in noise reduction , 2012, Knowl. Based Syst..

[74]  Nada Lavrac,et al.  Ensemble-based noise detection: noise ranking and visual performance evaluation , 2012, Data Mining and Knowledge Discovery.

[75]  Xinlei Chen,et al.  Webly Supervised Learning of Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[76]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[77]  Aritra Ghosh,et al.  Making risk minimization tolerant to label noise , 2014, Neurocomputing.

[78]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Wei Liu,et al.  Noise resistant graph ranking for improved web image search , 2011, CVPR 2011.

[80]  Deliang Fan,et al.  A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[81]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[82]  Tommi S. Jaakkola,et al.  Information Obfuscation of Graph Neural Networks , 2020, ICML.

[83]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Swami Sankaranarayanan,et al.  Learning From Noisy Labels by Regularized Estimation of Annotator Confusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Ken-ichi Kawarabayashi,et al.  Representation Learning on Graphs with Jumping Knowledge Networks , 2018, ICML.

[86]  Xingrui Yu,et al.  SIGUA: Forgetting May Make Learning with Noisy Labels More Robust , 2018, ICML.

[87]  Andreas Loukas,et al.  What graph neural networks cannot learn: depth vs width , 2019, ICLR.

[88]  Ken-ichi Kawarabayashi,et al.  Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization , 2019, ACL.

[89]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[90]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[91]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[92]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[93]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[94]  Frank Nielsen,et al.  Loss factorization, weakly supervised learning and label noise robustness , 2016, ICML.

[95]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[96]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[97]  Ivor W. Tsang,et al.  Masking: A New Perspective of Noisy Supervision , 2018, NeurIPS.

[98]  Virginia Wheway,et al.  Using Boosting to Detect Noisy Data , 2000, PRICAI Workshops.

[99]  Xiaogang Wang,et al.  Deep Self-Learning From Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[100]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Jae-Gil Lee,et al.  SELFIE: Refurbishing Unclean Samples for Robust Deep Learning , 2019, ICML.

[102]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[103]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[104]  Noel E. O'Connor,et al.  Unsupervised label noise modeling and loss correction , 2019, ICML.

[105]  Yanchun Zhang,et al.  Support Vector Machine for Outlier Detection in Breast Cancer Survivability Prediction , 2008, APWeb Workshops.

[106]  Jun Sun,et al.  Deep Learning From Noisy Image Labels With Quality Embedding , 2017, IEEE Transactions on Image Processing.

[107]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[108]  Jaap Kamps,et al.  Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision , 2017, ArXiv.

[109]  Zhifang Sui,et al.  A Soft-label Method for Noise-tolerant Distantly Supervised Relation Extraction , 2017, EMNLP.

[110]  Mohan S. Kankanhalli,et al.  Learning to Learn From Noisy Labeled Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).