Generalized Dropout

Deep Neural Networks often require good regularizers to generalize well. Dropout is one such regularizer that is widely used among Deep Learning practitioners. Recent work has shown that Dropout can also be viewed as performing Approximate Bayesian Inference over the network parameters. In this work, we generalize this notion and introduce a rich family of regularizers which we call Generalized Dropout. One set of methods in this family, called Dropout++, is a version of Dropout with trainable parameters. Classical Dropout emerges as a special case of this method. Another member of this family selects the width of neural network layers. Experiments show that these methods help in improving generalization performance over Dropout.

[1]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[4]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[5]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[6]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[7]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[8]  Brendan J. Frey,et al.  Adaptive dropout for training deep neural networks , 2013, NIPS.

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[12]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[13]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[14]  R. Venkatesh Babu,et al.  Learning the Architecture of Deep Neural Networks , 2015, ArXiv.

[15]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.