论文信息 - Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach

Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach

We present a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise. We propose two procedures for loss correction that are agnostic to both application domain and network architecture. They simply amount to at most a matrix inversion and multiplication, provided that we know the probability of each class being corrupted into another. We further show how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and thus providing an end-to-end framework. Extensive experiments on MNIST, IMDB, CIFAR-10, CIFAR-100 and a large scale dataset of clothing images employing a diversity of architectures — stacking dense, convolutional, pooling, dropout, batch normalization, word embedding, LSTM and residual layers — demonstrate the noise robustness of our proposals. Incidentally, we also prove that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise.

[1] M. Verleysen,et al. Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[2] Brendan van Rooyen,et al. Machine learning via transitions , 2015 .

[3] Nagarajan Natarajan,et al. Learning with Noisy Labels , 2013, NIPS.

[4] Ali Farhadi,et al. Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Frank Nielsen,et al. Loss factorization, weakly supervised learning and label noise robustness , 2016, ICML.

[6] Dumitru Erhan,et al. Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[7] Geoffrey E. Hinton,et al. Learning to Label Aerial Images from Noisy Data , 2012, ICML.

[8] Aaron Klein,et al. Efficient and Robust Automated Machine Learning , 2015, NIPS.

[9] Cheng Soon Ong,et al. Learning from Corrupted Binary Labels via Class-Probability Estimation , 2015, ICML.

[10] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[11] Mark D. Reid,et al. Composite Binary Losses , 2009, J. Mach. Learn. Res..

[12] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[14] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[15] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[16] Jonathan Krause,et al. The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.

[17] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[18] Aditya Krishna Menon,et al. Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[19] Nuno Vasconcelos,et al. On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost , 2008, NIPS.

[20] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.

[21] Rocco A. Servedio,et al. Random classification noise defeats all convex potential boosters , 2008, ICML '08.

[22] Ambuj Tewari,et al. Mixture Proportion Estimation via Kernel Embeddings of Distributions , 2016, ICML.

[23] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24] Xiaogang Wang,et al. Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Dong Xu,et al. Visual recognition by learning from web data: A weakly supervised domain generalization approach , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[27] Zoubin Ghahramani,et al. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[28] Nagarajan Natarajan,et al. Learning from Binary Labels with Instance-Dependent Corruption , 2016, ArXiv.

[29] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.

[30] Joan Bruna,et al. Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[31] Clayton Scott,et al. Class Proportion Estimation with Application to Multiclass Anomaly Rejection , 2013, AISTATS.

[32] Pietro Perona,et al. Learning Object Categories From Internet Image Searches , 2010, Proceedings of the IEEE.

[33] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.

[34] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[35] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[36] Antonio Criminisi,et al. Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[37] Gilles Blanchard,et al. Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[38] Dacheng Tao,et al. Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[41] Liva Ralaivola,et al. Learning SVMs from Sloppily Labeled Data , 2009, ICANN.

[42] Aritra Ghosh,et al. Making risk minimization tolerant to label noise , 2014, Neurocomputing.

[43] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).