Improve Noise Tolerance of Robust Loss via Noise-Awareness

Robust loss minimization is an important strategy for handling robust learning issue on noisy labels. Current approaches for designing robust losses involve the introduction of noise-robust factors, i.e., hyperparameters, to control the trade-off between noise robustness and learnability. However, finding suitable hyperparameters for different datasets with noisy labels is a challenging and time-consuming task. Moreover, existing robust loss methods usually assume that all training samples share common hyperparameters, which are independent of instances. This limits the ability of these methods to distinguish the individual noise properties of different samples and overlooks the varying contributions of diverse training samples in helping models understand underlying patterns. To address above issues, we propose to assemble robust loss with instance-dependent hyperparameters to improve their noise tolerance with theoretical guarantee. To achieve setting such instance-dependent hyperparameters for robust loss, we propose a meta-learning method which is capable of adaptively learning a hyperparameter prediction function, called Noise-Aware-Robust-Loss-Adjuster (NARL-Adjuster for brevity). Through mutual amelioration between hyperparameter prediction function and classifier parameters in our method, both of them can be simultaneously finely ameliorated and coordinated to attain solutions with good generalization capability. Four SOTA robust loss functions are attempted to be integrated with our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its noise tolerance and performance.

[1]  Deyu Meng,et al.  CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep Learning , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Heewon Kim,et al.  Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Deyu Meng,et al.  A Probabilistic Formulation for Meta-Weight-Net , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Deyu Meng,et al.  Learning an Explicit Hyper-parameter Prediction Function Conditioned on Tasks , 2021, J. Mach. Learn. Res..

[5]  Hossein Azizpour,et al.  Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels , 2021, NeurIPS.

[6]  Se-Young Yun,et al.  FINE Samples for Learning with Noisy Labels , 2021, NeurIPS.

[7]  Samy Bengio,et al.  Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.

[8]  Dimitris N. Metaxas,et al.  A Topological Filter for Learning with Label Noise , 2020, NeurIPS.

[9]  Yang Liu,et al.  Learning with Instance-Dependent Label Noise: A Sample Sieve Approach , 2020, ICLR.

[10]  Timothy M. Hospedales,et al.  Searching for Robustness: Loss Learning for Noisy Classification Tasks , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Deyu Meng,et al.  Learning to Purify Noisy Labels via Meta Soft Label Corrector , 2020, AAAI.

[12]  Deyu Meng,et al.  MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Hwanjun Song,et al.  Learning From Noisy Labels With Deep Neural Networks: A Survey , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Fengmao Lv,et al.  Can Cross Entropy Loss Be Robust to Label Noise? , 2020, IJCAI.

[15]  Sheng Liu,et al.  Early-Learning Regularization Prevents Memorization of Noisy Labels , 2020, NeurIPS.

[16]  James Bailey,et al.  Normalized Loss Functions for Deep Learning with Noisy Labels , 2020, ICML.

[17]  Gang Niu,et al.  Parts-dependent Label Noise: Towards Instance-dependent Label Noise , 2020, ArXiv.

[18]  Deyu Meng,et al.  Meta Transition Adaptation for Robust Deep Learning with Noisy Labels , 2020, ArXiv.

[19]  Timothy M. Hospedales,et al.  Meta-Learning in Neural Networks: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Junnan Li,et al.  DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[21]  Kilian Q. Weinberger,et al.  Identifying Mislabeled Data using the Area Under the Margin Ranking , 2020, NeurIPS.

[22]  James Bailey,et al.  Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Yueming Lyu,et al.  Curriculum Loss: Robust Learning and Generalization against Label Corruption , 2019, ICLR.

[24]  Jae-Gil Lee,et al.  SELFIE: Refurbishing Unclean Samples for Robust Deep Learning , 2019, ICML.

[25]  Lars Schmidt-Thieme,et al.  Learning Surrogate Losses , 2019, ArXiv.

[26]  Wei Wu,et al.  AM-LFS: AutoML for Loss Function Search , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Carlos Guestrin,et al.  Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment , 2019, ICML.

[28]  Noel E. O'Connor,et al.  Unsupervised label noise modeling and loss correction , 2019, ICML.

[29]  Maoguo Gong,et al.  Decomposition-Based Evolutionary Multiobjective Optimization to Self-Paced Learning , 2019, IEEE Transactions on Evolutionary Computation.

[30]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[31]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[32]  Lijun Wu,et al.  Learning to Teach with Dynamic Loss Functions , 2018, NeurIPS.

[33]  Eric P. Xing,et al.  AutoLoss: Learning Discrete Schedules for Alternate Optimization , 2018, ICLR 2018.

[34]  Deyu Meng,et al.  Small Sample Learning in Big Data Era , 2018, ArXiv.

[35]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[36]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[37]  Masashi Sugiyama,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[38]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[40]  Kevin Gimpel,et al.  Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[41]  Colin Raffel,et al.  Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[42]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[43]  Lei Zhang,et al.  CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Deyu Meng,et al.  A theoretical understanding of self-paced learning , 2017, Inf. Sci..

[45]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[46]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[47]  Wei Li,et al.  WebVision Database: Visual Learning and Understanding from Web Data , 2017, ArXiv.

[48]  Andrew McCallum,et al.  Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples , 2017, NIPS.

[49]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[50]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[51]  Yao Li,et al.  Attend in Groups: A Weakly-Supervised Deep Learning Framework for Learning from Web Data , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[53]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Deyu Meng,et al.  Learning to Detect Concepts from Webly-Labeled Video Data , 2016, IJCAI.

[55]  David M. Blei,et al.  Robust Probabilistic Modeling with Bayesian Data Reweighting , 2016, ICML.

[56]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Aditya Krishna Menon,et al.  Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[59]  Alexander Hauptmann,et al.  Self-Paced Learning for Matrix Factorization , 2015, AAAI.

[60]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[61]  Brian D. Ziebart,et al.  Robust Classification Under Sample Selection Bias , 2014, NIPS.

[62]  Deyu Meng,et al.  Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[63]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[64]  Zhuowen Tu,et al.  Learning to Predict from Crowdsourced Data , 2014, UAI.

[65]  Rob Fergus,et al.  Learning from Noisy Labels with Deep Neural Networks , 2014, ICLR.

[66]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[67]  Aritra Ghosh,et al.  Making risk minimization tolerant to label noise , 2014, Neurocomputing.

[68]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[69]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[70]  Naresh Manwani,et al.  Noise Tolerance Under Risk Minimization , 2011, IEEE Transactions on Cybernetics.

[71]  J. Paul Brooks,et al.  Support Vector Machines with the Ramp Loss and the Hard Margin Loss , 2011, Oper. Res..

[72]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[73]  Frank Nielsen,et al.  On the Efficient Minimization of Classification Calibrated Surrogates , 2008, NIPS.

[74]  Nuno Vasconcelos,et al.  On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost , 2008, NIPS.

[75]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[76]  W. Briec,et al.  -convexity , 2004 .

[77]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[78]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[79]  Deyu Meng,et al.  Learning an Explicit Hyperparameter Prediction Policy Conditioned on Tasks , 2021, ArXiv.

[80]  Baharan Mirzasoleiman,et al.  Coresets for Robust Training of Deep Neural Networks against Noisy Labels , 2020, NeurIPS.

[81]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[82]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .