Robustly representing inferential uncertainty in deep neural networks through sampling

As deep neural networks (DNNs) are applied to increasingly challenging problems, they will need to be able to represent their own uncertainty. Modeling uncertainty is one of the key features of Bayesian methods. Using Bernoulli dropout with sampling at prediction time has recently been proposed as an efficient and well performing variational inference method for DNNs. However, sampling from other multiplicative noise based variational distributions has not been investigated in depth. We evaluated Bayesian DNNs trained with Bernoulli or Gaussian multiplicative masking of either the units (dropout) or the weights (dropconnect). We tested the calibration of the probabilistic predictions of Bayesian convolutional neural networks (CNNs) on MNIST and CIFAR-10. Sampling at prediction time increased the calibration of the DNNs' probabalistic predictions. Sampling weights, whether Gaussian or Bernoulli, led to more robust representation of uncertainty compared to sampling of units. However, using either Gaussian or Bernoulli dropout led to increased test set classification accuracy. Based on these findings we used both Bernoulli dropout and Gaussian dropconnect concurrently, which we show approximates the use of a spike-and-slab variational distribution without increasing the number of learned parameters. We found that spike-and-slab sampling had higher test set performance than Gaussian dropconnect and more robustly represented its uncertainty compared to Bernoulli dropout.

[1]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[2]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[3]  Stefan Habenschuss,et al.  Stochastic Computations in Cortical Microcircuit Models , 2013, PLoS Comput. Biol..

[4]  Zhe Gan,et al.  Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[6]  Pierre Baldi,et al.  Understanding Dropout , 2013, NIPS.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[10]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[11]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[12]  Aki Vehtari,et al.  Expectation propagation for neural networks with sparsity-promoting priors , 2013, J. Mach. Learn. Res..

[13]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[14]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[15]  Yg,et al.  Dropout as a Bayesian Approximation : Insights and Applications , 2015 .

[16]  Miguel Lázaro-Gredilla,et al.  Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[17]  Wolfgang Maass,et al.  Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons , 2011, PLoS Comput. Biol..

[18]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[19]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[20]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[23]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[24]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[25]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[26]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[27]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[28]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[29]  J. Mooij,et al.  Smart Regularization of Deep Architectures , 2015 .