Which is Better for Learning with Noisy Labels: The Semi-supervised Method or Modeling Label Noise?

In real life, accurately annotating large-scale datasets is sometimes difficult. Datasets used for training deep learning models are likely to contain label noise. To make use of the dataset containing label noise, two typical methods have been proposed. One is to employ the semi-supervised method by exploiting labeled confident examples and unlabeled unconfident examples . The other one is to model label noise and design statistically consistent classifiers. A natural question remains unsolved: which one should be used for a specific real-world application? In this paper, we answer the question from the perspective of causal data generative process . Specifically, the performance of the semi-supervised based method depends heavily on the data generative process while the method modeling label-noise is not influenced by the generation process. For example, for a given dataset, if it has a causal generative structure that the features cause the label, the semi-supervised based method would not be helpful. When the causal structure is unknown, we provide an intuitive method to discover the causal structure for a given dataset containing label noise.

[1]  Tao Kong,et al.  iBOT: Image BERT Pre-Training with Online Tokenizer , 2021, ArXiv.

[2]  Hossein Azizpour,et al.  Consistency Regularization Can Improve Robustness to Label Noise , 2021, ArXiv.

[3]  Lirong Wu,et al.  Co-learning: Learning from Noisy Labels with Self-supervision , 2021, ACM Multimedia.

[4]  Thomas Peel,et al.  A Framework using Contrastive Learning for Classification with Noisy Labels , 2021, Data.

[5]  Andrew S. Lan,et al.  Contrastive Learning Improves Model Robustness Under Label Noise , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Qi Wu,et al.  Jo-SRC: A Contrastive Approach for Combating Noisy Labels , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ge Wang,et al.  SPICE: Semantic Pseudo-Labeling for Image Clustering , 2021, IEEE Transactions on Image Processing.

[8]  Masashi Sugiyama,et al.  Provably End-to-end Label-Noise Learning without Anchor Points , 2021, ICML.

[9]  Avi Mendelson,et al.  Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels , 2020, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[10]  Junnan Li,et al.  MoPro: Webly Supervised Learning with Momentum Prototypes , 2020, ICLR.

[11]  Gang Niu,et al.  Parts-dependent Label Noise: Towards Instance-dependent Label Noise , 2020, ArXiv.

[12]  Gang Niu,et al.  Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning , 2020, NeurIPS.

[13]  Lei Feng,et al.  Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Junnan Li,et al.  DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[15]  David Berthelot,et al.  FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[16]  Yang Liu,et al.  Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates , 2019, ICML.

[17]  Thomas Brox,et al.  SELF: Learning to Filter Noisy Labels with Self-Ensembling , 2019, ICLR.

[18]  P. Spirtes,et al.  Review of Causal Discovery Methods Based on Graphical Models , 2019, Front. Genet..

[19]  Gang Niu,et al.  Are Anchor Points Really Indispensable in Label-Noise Learning? , 2019, NeurIPS.

[20]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[21]  Masashi Sugiyama,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[22]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[23]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[24]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[25]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[26]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[27]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Peter Spirtes,et al.  Causal discovery and inference: concepts and recent methodological advances , 2016, Applied Informatics.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Based on Exogeneity , 2015, ArXiv.

[31]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[32]  D. Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[34]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[35]  Peter Bühlmann,et al.  Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs (Abstract) , 2011, UAI.

[36]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[37]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[38]  R. Jonker,et al.  Improving the Hungarian assignment algorithm , 1986 .

[39]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[40]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[41]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[42]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .