LocalDrop: A Hybrid Regularization for Deep Neural Networks

In neural networks, developing regularization algorithms to settle overfitting is one of the major study areas. We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs), including drop rates and weight matrices, has been developed based on the proposed upper bound of the local Rademacher complexity by the strict mathematical deduction. The analyses of dropout in FCNs and DropBlock in CNNs with keep rate matrices in different layers are also included in the complexity analyses. With the new regularization function, we establish a two-stage procedure to obtain the optimal keep rate matrix and weight matrix to realize the whole training model. Extensive experiments have been conducted to demonstrate the effectiveness of LocalDrop in different models by comparing it with several algorithms and the influences of different hyperparameters on the final performances.

[1]  Benjamin Graham,et al.  Fractional Max-Pooling , 2014, ArXiv.

[2]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[3]  Yoshua Bengio,et al.  Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.

[4]  Alex Kendall,et al.  Concrete Dropout , 2017, NIPS.

[5]  Zoubin Ghahramani,et al.  Variational Bayesian dropout: pitfalls and fixes , 2018, ICML.

[6]  Liwei Wang,et al.  Dropout Training, Data-dependent Regularization, and Generalization Bounds , 2018, ICML.

[7]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[9]  Zhi-Hua Zhou,et al.  Dropout Rademacher complexity of deep neural networks , 2014, Science China Information Sciences.

[10]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[11]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[12]  Fuchun Sun,et al.  MAT: A Multimodal Attentive Translator for Image Captioning , 2017, IJCAI.

[13]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[16]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[18]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[19]  Junwei Lu,et al.  On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond , 2018, ArXiv.

[20]  Quoc V. Le,et al.  DropBlock: A regularization method for convolutional networks , 2018, NeurIPS.

[21]  Xiaoli Li,et al.  Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition , 2015, IJCAI.

[22]  Huan Wang,et al.  Adaptive Dropout with Rademacher Complexity Regularization , 2018, ICLR.

[23]  Brendan J. Frey,et al.  Adaptive dropout for training deep neural networks , 2013, NIPS.

[24]  D. Kalman A Singularly Valuable Decomposition: The SVD of a Matrix , 1996 .

[25]  Il-Chul Moon,et al.  Adversarial Dropout for Supervised and Semi-supervised Learning , 2017, AAAI.

[26]  Ryota Tomioka,et al.  Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[27]  Ruosong Wang,et al.  Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.

[28]  Nenghai Yu,et al.  Capacity Control of ReLU Neural Networks by Basis-path Norm , 2018, AAAI.

[29]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[30]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[31]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[32]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[33]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[34]  Nicholas J. Higham,et al.  Fast Polar Decomposition of an Arbitrary Matrix , 1990, SIAM J. Sci. Comput..

[35]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[36]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[37]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Mohammed Bennamoun,et al.  A Spatial Layout and Scale Invariant Feature Representation for Indoor Scene Classification , 2015, IEEE Transactions on Image Processing.

[39]  S. Osher,et al.  Fast Singular Value Thresholding without Singular Value Decomposition , 2013 .

[40]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[41]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[42]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[43]  Jonathan Tompson,et al.  Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Ohad Shamir,et al.  Size-Independent Sample Complexity of Neural Networks , 2017, COLT.

[45]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[46]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.