Fixing Variational Bayes: Deterministic Variational Inference for Bayesian Neural Networks

Bayesian neural networks (BNNs) hold great promise as a flexible and principled solution to deal with uncertainty when learning from finite data. Among approaches to realize probabilistic inference in deep neural networks, variational Bayes (VB) is theoretically grounded, generally applicable, and computationally efficient. With wide recognition of potential advantages, why is it that variational Bayes has seen very limited practical use for BNNs in real applications? We argue that variational inference in neural networks is fragile: successful implementations require careful initialization and tuning of prior variances, as well as controlling the variance of Monte Carlo gradient estimates. We fix VB and turn it into a robust inference tool for Bayesian neural networks. We achieve this with two innovations: first, we introduce a novel deterministic method to approximate moments in neural networks, eliminating gradient variance; second, we introduce a hierarchical prior for parameters and a novel empirical Bayes procedure for automatically selecting prior variances. Combining these two innovations, the resulting method is highly efficient and robust. On the application of heteroscedastic regression we demonstrate strong predictive performance over alternative approaches.

[1]  Ian R. Harris Predictive fit for natural exponential families , 1989 .

[2]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[3]  David J. C. MacKay,et al.  Developments in Probabilistic Modelling with Neural Networks - Ensemble Learning , 1995, SNN Symposium on Neural Networks.

[4]  Daniel Hernández-Lobato,et al.  Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[5]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[6]  Christopher G. Small,et al.  Expansions and Asymptotics for Statistics , 2010 .

[7]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[9]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[10]  渡邊 澄夫 Algebraic geometry and statistical learning theory , 2009 .

[11]  Jaehoon Lee,et al.  Deep Neural Networks as Gaussian Processes , 2017, ICLR.

[12]  Radford M. Neal Bayesian Learning via Stochastic Dynamics , 1992, NIPS.

[13]  Zhanxing Zhu,et al.  Neural Control Variates for Variance Reduction , 2018, ArXiv.

[14]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[15]  Melih Kandemir,et al.  Sampling-Free Variational Inference of Bayesian Neural Nets , 2018, ArXiv.

[16]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[17]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[18]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[19]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[20]  Richard E. Turner,et al.  Gaussian Process Behaviour in Wide Deep Neural Networks , 2018, ICLR.

[21]  Soumya Ghosh,et al.  Assumed Density Filtering Methods for Learning Bayesian Neural Networks , 2016, AAAI.

[22]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[23]  Bernard Ghanem,et al.  Analytic Expressions for Probabilistic Moments of PL-DNN with Gaussian Input , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[25]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[26]  Leonard Hasenclever,et al.  The True Cost of Stochastic Gradient Langevin Dynamics , 2017, 1706.02692.

[27]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[28]  Lawrence Carin,et al.  Learning Structured Weight Uncertainty in Bayesian Neural Networks , 2017, AISTATS.

[29]  Alexander D'Amour,et al.  Reducing Reparameterization Gradient Variance , 2017, NIPS.

[30]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[31]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[32]  J. Vanhatalo,et al.  MCMC Methods for MLP-network and Gaussian Process and Stuff – A documentation for Matlab Toolbox , 2006 .

[33]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[34]  José Miguel Hernández-Lobato,et al.  Meta-Learning for Stochastic Gradient MCMC , 2018, ICLR.

[35]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[36]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[37]  Ahn Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .