Regularization of Deep Neural Networks with Spectral Dropout

The big breakthrough on the ImageNet challenge in 2012 was partially due to the 'Dropout' technique used to avoid overfitting. Here, we introduce a new approach called 'Spectral Dropout' to improve the generalization ability of deep neural networks. We cast the proposed approach in the form of regular Convolutional Neural Network (CNN) weight layers using a decorrelation transform with fixed basis functions. Our spectral dropout method prevents overfitting by eliminating weak and 'noisy' Fourier domain coefficients of the neural network activations, leading to remarkably better results than the current regularization methods. Furthermore, the proposed is very efficient due to the fixed basis functions used for spectral transformation. In particular, compared to Dropout and Drop-Connect, our method significantly speeds up the network convergence rate during the training process (roughly ×2), with considerably higher neuron pruning rates (an increase of ∼30%). We demonstrate that the spectral dropout can also be used in conjunction with other regularization approaches resulting in additional performance gains.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Hao Zhou,et al.  Less Is More: Towards Compact CNNs , 2016, ECCV.

[3]  Shih-Fu Chang,et al.  An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[5]  Jianqin Zhou,et al.  On discrete cosine transform , 2011, ArXiv.

[6]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[7]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Andrew Lavin,et al.  Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Tara N. Sainath,et al.  Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[11]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[12]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[13]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[14]  Juergen Luettin,et al.  Fast Face Detection using MLP and FFT , 1999 .

[15]  Yann LeCun,et al.  Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[16]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[17]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[18]  Benjamin Graham,et al.  Fractional Max-Pooling , 2014, ArXiv.

[19]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[20]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[24]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[25]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[26]  Mohammed Bennamoun,et al.  A Spatial Layout and Scale Invariant Feature Representation for Indoor Scene Classification , 2015, IEEE Transactions on Image Processing.

[27]  Jasper Snoek,et al.  Spectral Representations for Convolutional Neural Networks , 2015, NIPS.

[28]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[29]  Pierre Baldi,et al.  A theory of local learning, the learning channel, and the optimality of backpropagation , 2015, Neural Networks.

[30]  Jeff Johnson,et al.  Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.

[31]  Mohammed Bennamoun,et al.  A Discriminative Representation of Convolutional Features for Indoor Scene Recognition , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[32]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[33]  Gianluca Vinti,et al.  Max-product neural network and quasi-interpolation operators activated by sigmoidal functions , 2016, J. Approx. Theory.

[34]  Mohammed Bennamoun,et al.  A Guide to Convolutional Neural Networks for Computer Vision , 2018, A Guide to Convolutional Neural Networks for Computer Vision.

[35]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[36]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[37]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[38]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[39]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[40]  Qiang Chen,et al.  Network In Network , 2013, ICLR.