Soft Dropout And Its Variational Bayes Approximation

Soft dropout, a generalization of standard “hard” dropout, is introduced to regularize the parameters in neural networks and prevent overfitting. We replace the “hard” dropout mask following a Bernoulli distribution with the “soft” mask following a beta distribution to drop the hidden nodes in different levels. The soft dropout method can introduce continuous mask coefficients in the interval of [0, 1], rather than only zero and one. Meanwhile, in order to implement the adaptive dropout rate via adaptive distribution parameters, we respectively utilize the half-Gaussian distributed and the half-Laplace distributed variables to approximate the beta distributed masks and apply a variation of variational Bayes optimization called stochastic gradient variational Bayes (SGVB) algorithm to optimize the distribution parameters. In the experiments, compared with the standard soft dropout with fixed dropout rate, the adaptive soft dropout method generally improves the performance. In addition, the proposed soft dropout and its adaptive versions achieve performance improvement compared with the referred methods on both image classification and regression tasks.

[1]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[2]  Jun Guo,et al.  SEA: A Combined Model for Heat Demand Prediction , 2018, 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC).

[3]  Fredrik Wallin,et al.  Deep Neural Network-Based Impacts Analysis of Multimodal Factors on Heat Demand Prediction , 2020, IEEE Transactions on Big Data.

[4]  Jun Guo,et al.  Instance-Level Coupled Subspace Learning for Fine-Grained Sketch-Based Image Retrieval , 2016, ECCV Workshops.

[5]  Jie Cao,et al.  Dual Cross-Entropy Loss for Small-Sample Fine-Grained Vehicle Classification , 2019, IEEE Transactions on Vehicular Technology.

[6]  Shin-ichi Maeda,et al.  A Bayesian encourages dropout , 2014, ArXiv.

[7]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[10]  Shahrokh Valaee,et al.  Survey of Dropout Methods for Deep Neural Networks , 2019, ArXiv.

[11]  Ngoc Thang Vu,et al.  Densely Connected Convolutional Networks for Speech Recognition , 2018, ITG Symposium on Speech Communication.

[12]  Tatsuya Harada,et al.  Between-Class Learning for Image Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[14]  Jun Guo,et al.  Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[16]  Jie Cao,et al.  Large-Margin Regularized Softmax Cross-Entropy Loss , 2019, IEEE Access.

[17]  Shaoguo Wen,et al.  Shoe-Print Image Retrieval With Multi-Part Weighted CNN , 2019, IEEE Access.

[18]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[19]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[23]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jun Guo,et al.  SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Jun Guo,et al.  Short Utterance Based Speech Language Identification in Intelligent Vehicles With Time-Scale Modifications and Deep Bottleneck Features , 2019, IEEE Transactions on Vehicular Technology.

[27]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[28]  W. Marsden I and J , 2012 .

[29]  Jun Guo,et al.  Cross-modal subspace learning for fine-grained sketch-based image retrieval , 2017, Neurocomputing.