Advanced Dropout: A Model-Free Methodology for Bayesian Dropout Optimization

Due to lack of data, overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs). We propose advanced dropout, a model-free methodology, to mitigate overfitting and improve the performance of DNNs. The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate. Specifically, the distribution parameters are optimized by stochastic gradient variational Bayes in order to carry out an end-to-end training. We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets (five small-scale datasets and two large-scale datasets) with various base models. The advanced dropout outperforms all the referred techniques on all the datasets.We further compare the effectiveness ratios and find that advanced dropout achieves the highest one on most cases. Next, we conduct a set of analysis of dropout rate characteristics, including convergence of the adaptive dropout rate, the learned distributions of dropout masks, and a comparison with dropout rate generation without an explicit distribution. In addition, the ability of overfitting prevention is evaluated and confirmed. Finally, we extend the application of the advanced dropout to uncertainty inference, network pruning, text classification, and regression. The proposed advanced dropout is also superior to the corresponding referred methods.

[1]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[2]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[3]  Jun Guo,et al.  Soft Dropout And Its Variational Bayes Approximation , 2019, 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP).

[4]  Jeff A. Bilmes,et al.  Jumpout : Improved Dropout for Deep Neural Networks with ReLUs , 2019, ICML.

[5]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[6]  Jie Cao,et al.  Dual Cross-Entropy Loss for Small-Sample Fine-Grained Vehicle Classification , 2019, IEEE Transactions on Vehicular Technology.

[7]  Fatih Murat Porikli,et al.  Regularization of Deep Neural Networks with Spectral Dropout , 2017, Neural Networks.

[8]  Dacheng Tao,et al.  Webly-supervised Fine-grained Visual Categorization via Deep Domain Adaptation. , 2016, IEEE transactions on pattern analysis and machine intelligence.

[9]  S. Valaee,et al.  Survey of Dropout Methods for Deep Neural Networks , 2019, ArXiv.

[10]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[11]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[12]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[13]  Tianbao Yang,et al.  Improved Dropout for Shallow and Deep Learning , 2016, NIPS.

[14]  Shahrokh Valaee,et al.  Ising-dropout: A Regularization Method for Training and Compression of Deep Neural Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[16]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[17]  Jun Guo,et al.  Cross-modal subspace learning for fine-grained sketch-based image retrieval , 2017, Neurocomputing.

[18]  Jun Guo,et al.  SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Shin-ichi Maeda,et al.  A Bayesian encourages dropout , 2014, ArXiv.

[20]  Shaoguo Wen,et al.  Shoe-Print Image Retrieval With Multi-Part Weighted CNN , 2019, IEEE Access.

[21]  Jian Pei,et al.  Demystifying Dropout , 2019, ICML.

[22]  Lorenzo Porzi,et al.  Dropout distillation , 2016, ICML.

[23]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[24]  Dacheng Tao,et al.  Continuous Dropout , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Bin Li,et al.  β-Dropout: A Unified Dropout , 2019, IEEE Access.

[26]  Dacheng Tao,et al.  Webly-Supervised Fine-Grained Visual Categorization via Deep Domain Adaptation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Zhengfang Duanmu,et al.  Group Maximum Differentiation Competition: Model Comparison with Few Samples , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Arne Leijon,et al.  Bayesian Estimation of Beta Mixture Models with Variational Inference , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Long Quan,et al.  Efficient Multi-view Surface Refinement with Adaptive Resolution Control , 2016, ECCV.

[30]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[31]  Tarek F. Abdelzaher,et al.  ApDeepSense: Deep Learning Uncertainty Estimation without the Pain for IoT Applications , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[32]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[33]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[34]  Jie Chen,et al.  Mutual information-based dropout: Learning deep relevant feature representation architectures , 2019, Neurocomputing.

[35]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[37]  Yuhua Tang,et al.  Rademacher dropout: An adaptive dropout for deep neural network via optimizing generalization gap , 2019, Neurocomputing.

[38]  Alex Kendall,et al.  Concrete Dropout , 2017, NIPS.

[39]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Brendan J. Frey,et al.  Adaptive dropout for training deep neural networks , 2013, NIPS.

[41]  Yarin Gal,et al.  Dropout Inference in Bayesian Neural Networks with Alpha-divergences , 2017, ICML.

[42]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[43]  Ho-Jin Choi,et al.  Controlled dropout: A different approach to using dropout on deep neural network , 2017, 2017 IEEE International Conference on Big Data and Smart Computing (BigComp).

[44]  Jie Cao,et al.  Large-Margin Regularized Softmax Cross-Entropy Loss , 2019, IEEE Access.

[45]  Cong Bai,et al.  Unsupervised Adversarial Instance-Level Image Retrieval , 2021, IEEE Transactions on Multimedia.

[46]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[47]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[49]  Jun Guo,et al.  Instance-Level Coupled Subspace Learning for Fine-Grained Sketch-Based Image Retrieval , 2016, ECCV Workshops.

[50]  Lei Zhang,et al.  Variational Bayesian Dropout With a Hierarchical Prior , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[52]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[53]  Jen-Tzung Chien,et al.  Image-text dual neural network with decision strategy for small-sample image classification , 2019, Neurocomputing.

[54]  Yaoliang Yu,et al.  Dropout with Expectation-linear Regularization , 2016, ICLR.

[55]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[56]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[57]  AnLe,et al.  Semi-Supervised Discriminative Classification Robust to Sample-Outliers and Feature-Noises , 2019 .