Asymmetric Loss Functions for Learning with Noisy Labels

Robust loss functions are essential for training deep neural networks with better generalization power in the presence of noisy labels. Symmetric loss functions are confirmed to be robust to label noise. However, the symmetric condition is overly restrictive. In this work, we propose a new class of loss functions, namely asymmetric loss functions, which are robust to learning with noisy labels for various types of noise. We investigate general theoretical properties of asymmetric loss functions, including classification calibration, excess risk bound, and noise tolerance. Meanwhile, we introduce the asymmetry ratio to measure the asymmetry of a loss function. The empirical results show that a higher ratio would provide better noise tolerance. Moreover, we modify several commonlyused loss functions and establish the necessary and sufficient conditions for them to be asymmetric. Experimental results on benchmark datasets demonstrate that asymmetric loss functions can outperform state-of-the-art methods. The code is available at https://github.com/hitcszx/ALFs

[1]  Naresh Manwani,et al.  Noise Tolerance Under Risk Minimization , 2011, IEEE Transactions on Cybernetics.

[2]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[3]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[4]  Masashi Sugiyama,et al.  On Symmetric Losses for Learning from Corrupted Labels , 2019, ICML.

[5]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[6]  Aritra Ghosh,et al.  Making risk minimization tolerant to label noise , 2014, Neurocomputing.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  James Bailey,et al.  Normalized Loss Functions for Deep Learning with Noisy Labels , 2020, ICML.

[9]  Aditya Krishna Menon,et al.  Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[10]  Fengmao Lv,et al.  Can Cross Entropy Loss Be Robust to Label Noise? , 2020, IJCAI.

[11]  Yueming Lyu,et al.  Curriculum Loss: Robust Learning and Generalization against Label Corruption , 2019, ICLR.

[12]  Junmo Kim,et al.  NLNL: Negative Learning for Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Yang Hua,et al.  Improving MAE against CCE under Label Noise , 2019, ArXiv.

[14]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[15]  Kotagiri Ramamohanarao,et al.  Learning with Bounded Instance- and Label-dependent Label Noise , 2017, ICML.

[16]  Yang Liu,et al.  Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates , 2019, ICML.

[17]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[18]  James Bailey,et al.  Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Rocco A. Servedio,et al.  Random classification noise defeats all convex potential boosters , 2008, ICML '08.

[20]  Jae-Gil Lee,et al.  Learning from Noisy Labels with Deep Neural Networks: A Survey , 2020, ArXiv.

[21]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[22]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[23]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[24]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[27]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[28]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[29]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[30]  Quanquan Gu,et al.  An Improved Analysis of Training Over-parameterized Deep Neural Networks , 2019, NeurIPS.

[31]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[32]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[33]  Wei Li,et al.  WebVision Database: Visual Learning and Understanding from Web Data , 2017, ArXiv.

[34]  Ankit Singh Rawat,et al.  Can gradient clipping mitigate label noise? , 2020, ICLR.

[35]  Wei Liu,et al.  Noise resistant graph ranking for improved web image search , 2011, CVPR 2011.

[36]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.