Bayesian Convolutional Neural Networks

We introduce Bayesian Convolutional Neural Networks (BayesCNNs), a variant of Convolutional Neural Networks (CNNs) which is built upon Bayes by Backprop. We demonstrate how this novel reliable variational inference method can serve as a fundamental construct for various network architectures. On multiple datasets in supervised learning settings (MNIST, CIFAR-10, CIFAR-100, and STL-10), our proposed variational inference method achieves performances equivalent to frequentist inference in identical architectures, while a measurement for uncertainties and a regularisation are incorporated naturally. In the past, Bayes by Backprop has been successfully implemented in feedforward and recurrent neural networks, but not in convolutional ones. This work symbolises the extension of Bayesian neural networks which encompasses all three aforementioned types of network architectures now.

[1]  Jürgen Schmidhuber,et al.  Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.

[2]  Dmitry Vetrov,et al.  Variance Networks: When Expectation Does Not Meet Your Expectations , 2018, ICLR.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[5]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Max Welling,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS 2015.

[9]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[10]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[11]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[12]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[13]  Jen-Tzung Chien,et al.  Bayesian recurrent neural network language model , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[14]  Myunghee Cho Paik,et al.  Uncertainty quantification using Bayesian neural networks in classification: Application to ischemic stroke lesion segmentation , 2018 .

[15]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[16]  Zhe Gan,et al.  Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Filip De Turck,et al.  Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks , 2016, ArXiv.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[20]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[21]  Philip B. Stark,et al.  Before reproducibility must come preproducibility , 2018, Nature.

[22]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[23]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[24]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[25]  Marcus Liwicki,et al.  A Comprehensive guide to Bayesian Convolutional Neural Network with Variational Inference , 2019, ArXiv.

[26]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[27]  Yann LeCun,et al.  Transforming Neural-Net Output Levels to Probability Distributions , 1990, NIPS.

[28]  Alex Graves,et al.  Stochastic Backpropagation through Mixture Density Distributions , 2016, ArXiv.

[29]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[30]  Karl J. Friston,et al.  Variational free energy and the Laplace approximation , 2007, NeuroImage.

[31]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[32]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[33]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[34]  Zachary Chase Lipton,et al.  Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking , 2016 .

[35]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[36]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[37]  Jianfeng Gao,et al.  Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks \& Replay Buffer Spiking , 2016, ArXiv.