Meta Label Correction for Noisy Label Learning

Leveraging weak or noisy supervision for building effective machine learning models has long been an important research problem. Its importance has further increased recently due to the growing need for large-scale datasets to train deep learning models. Weak or noisy supervision could originate from multiple sources including non-expert annotators or automatic labeling based on heuristics or user interaction signals. There is an extensive amount of previous work focusing on leveraging noisy labels. Most notably, recent work has shown impressive gains by using a meta-learned instance re-weighting approach where a meta-learning framework is used to assign instance weights to noisy labels. In this paper, we extend this approach via posing the problem as label correction problem within a meta-learning framework. We view the label correction procedure as a meta-process and propose a new meta-learning based framework termed MLC (Meta Label Correction) for learning with noisy labels. Specifically, a label correction network is adopted as a meta-model to produce corrected labels for noisy labels while the main model is trained to leverage the corrected labeled. Both models are jointly trained by solving a bi-level optimization problem. We run extensive experiments with different label noise levels and types on both image recognition and text classification tasks. We compare the reweighing and correction approaches showing that the correction framing addresses some of the limitation of reweighting. We also show that the proposed MLC approach achieves large improvements over previous methods in many settings.

[1]  Xingrui Yu,et al.  How does Disagreement Help Generalization against Label Corruption? , 2019, ICML.

[2]  Albert Fornells,et al.  A study of the effect of different types of noise on the precision of supervised learning techniques , 2010, Artificial Intelligence Review.

[3]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Lars Kai Hansen,et al.  Design of robust neural network classifiers , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[9]  Quoc V. Le,et al.  Meta Pseudo Labels , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Masashi Sugiyama,et al.  Rethinking Importance Weighting for Deep Learning under Distribution Shift , 2020, NeurIPS.

[11]  Gang Niu,et al.  Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning , 2020, NeurIPS.

[12]  Gang Niu,et al.  Parts-dependent Label Noise: Towards Instance-dependent Label Noise , 2020, ArXiv.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Ivor W. Tsang,et al.  Masking: A New Perspective of Noisy Supervision , 2018, NeurIPS.

[15]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[16]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[17]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[19]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[22]  Abhinav Gupta,et al.  Learning from Noisy Large-Scale Datasets with Minimal Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[24]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[25]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[26]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[27]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[28]  Fabian Pedregosa,et al.  Hyperparameter optimization with approximate gradient , 2016, ICML.

[29]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[31]  Gang Niu,et al.  Are Anchor Points Really Indispensable in Label-Noise Learning? , 2019, NeurIPS.

[32]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[33]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[34]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[35]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[36]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[37]  Kevin Gimpel,et al.  Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[38]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[39]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[40]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[41]  Mohan S. Kankanhalli,et al.  Learning to Learn From Noisy Labeled Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[43]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[44]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[45]  Geoffrey E. Hinton,et al.  Learning to Label Aerial Images from Noisy Data , 2012, ICML.