Curriculum Loss: Robust Learning and Generalization against Label Corruption

Deep neural networks (DNNs) have great expressive power, which can even memorize samples with wrong labels. It is vitally important to reiterate robustness and generalization in DNNs against label corruption. To this end, this paper studies the 0-1 loss, which has a monotonic relationship with an empirical adversary (reweighted) risk~\citep{hu2016does}. Although the 0-1 loss has some robust properties, it is difficult to optimize. To efficiently optimize the 0-1 loss while keeping its robust properties, we propose a very simple and efficient loss, i.e. curriculum loss (CL). Our CL is a tighter upper bound of the 0-1 loss compared with conventional summation based surrogate losses. Moreover, CL can adaptively select samples for model training. As a result, our loss can be deemed as a novel perspective of curriculum sample selection strategy, which bridges a connection between curriculum learning and robust learning. Experimental results on benchmark datasets validate the robustness of the proposed loss.

[1]  Ivor W. Tsang,et al.  Masking: A New Perspective of Noisy Supervision , 2018, NeurIPS.

[2]  Kibok Lee,et al.  Robust Inference via Generative Classifiers for Handling Noisy Labels , 2019, ICML.

[3]  Xuelong Li,et al.  Robust Web Image Annotation via Exploring Multi-Facet and Structural Knowledge , 2017, IEEE Transactions on Image Processing.

[4]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Shiguang Shan,et al.  Self-Paced Learning with Diversity , 2014, NIPS.

[6]  Andrew M. Dai,et al.  Virtual Adversarial Training for Semi-Supervised Text Classification , 2016, ArXiv.

[7]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[8]  Nuno Vasconcelos,et al.  On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost , 2008, NIPS.

[9]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[10]  Gang Niu,et al.  Does Distributionally Robust Supervised Learning Give Robust Classifiers? , 2016, ICML.

[11]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[12]  Xingrui Yu,et al.  How does Disagreement Help Generalization against Label Corruption? , 2019, ICML.

[13]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[14]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[15]  Anima Anandkumar,et al.  Learning From Noisy Singly-labeled Data , 2017, ICLR.

[16]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[17]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[18]  Shiguang Shan,et al.  Self-Paced Curriculum Learning , 2015, AAAI.

[19]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[20]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[21]  Yufeng Liu,et al.  Robust Truncated Hinge Loss Support Vector Machines , 2007 .

[22]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[23]  John DeNero,et al.  L1 and L2 regularization for multiclass hinge loss models , 2011, MLSLP.

[24]  Aditya Krishna Menon,et al.  Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[25]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[26]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[28]  Hao Su,et al.  Crowdsourcing Annotations for Visual Object Detection , 2012, HCOMP@AAAI.

[29]  Yueming Lyu,et al.  Marginalized Average Attentional Network for Weakly-Supervised Learning , 2019, ICLR.