DisturbLabel: Regularizing CNN on the Loss Layer

During a long period of time we are combating overfitting in the CNN training process with model regularization, including weight decay, model averaging, data augmentation, etc. In this paper, we present DisturbLabel, an extremely simple algorithm which randomly replaces a part of labels as incorrect values in each iteration. Although it seems weird to intentionally generate incorrect training labels, we show that DisturbLabel prevents the network training from over-fitting by implicitly averaging over exponentially many networks which are trained with different label sets. To the best of our knowledge, DisturbLabel serves as the first work which adds noises on the loss layer. Meanwhile, DisturbLabel cooperates well with Dropout to provide complementary regularization functions. Experiments demonstrate competitive recognition results on several popular image recognition datasets.

[1]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[2]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[3]  S. Oh,et al.  Regularization using jittered training data , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[6]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[7]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[8]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Luca Maria Gambardella,et al.  Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.

[12]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[13]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[14]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[15]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[16]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[17]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[20]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[21]  Stephen Tyree,et al.  Learning with Marginalized Corrupted Features , 2013, ICML.

[22]  Nitish Srivastava,et al.  Discriminative Transfer Learning with Tree-based Priors , 2013, NIPS.

[23]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[24]  Benjamin Graham,et al.  Fractional Max-Pooling , 2014, ArXiv.

[25]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[26]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[28]  Stephen Tyree,et al.  Marginalizing Corrupted Features , 2014, ArXiv.

[29]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[30]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[31]  Martin A. Riedmiller,et al.  Improving Deep Neural Networks with Probabilistic Maxout Units , 2013, ICLR.

[32]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[33]  Qi Tian,et al.  Image Classification and Retrieval are ONE , 2015, ICMR.

[34]  Pascal Vincent,et al.  Dropout as data augmentation , 2015, ArXiv.

[35]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[37]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[42]  Zhuowen Tu,et al.  Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.

[43]  Qi Tian,et al.  InterActive: Inter-Layer Activeness Propagation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jun Zhu,et al.  Pose-Guided Human Parsing by an AND/OR Graph Using Pose-Context Features , 2016, AAAI.