Radial and Directional Posteriors for Bayesian Deep Learning

We propose a new variational family for Bayesian neural networks. We decompose the variational posterior into two components, where the radial component captures the strength of each neuron in terms of its magnitude; while the directional component captures the statistical dependencies among the weight parameters. The dependencies learned via the directional density provide better modeling performance compared to the widely-used Gaussian mean-field-type variational family. In addition, the strength of input and output neurons learned via our posterior provides a structured way to compress neural networks. Indeed, experiments show that our variational family improves predictive performance and yields compressed networks simultaneously.

[1]  Guodong Zhang,et al.  Functional Variational Bayesian Neural Networks , 2019, ICLR.

[2]  Dmitry P. Vetrov,et al.  Structured Bayesian Pruning via Log-Normal Multiplicative Noise , 2017, NIPS.

[3]  Nicola De Cao,et al.  Hyperspherical Variational Auto-Encoders , 2018, UAI 2018.

[4]  Guodong Zhang,et al.  Noisy Natural Gradient as Variational Inference , 2017, ICML.

[5]  Jimmy J. Lin,et al.  FLOPs as a Direct Optimization Objective for Learning Sparse Neural Networks , 2018, ArXiv.

[6]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[7]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[8]  David Barber,et al.  A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[9]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[10]  Hao Zhou,et al.  Less Is More: Towards Compact CNNs , 2016, ECCV.

[11]  Geoffrey E. Hinton,et al.  Varieties of Helmholtz Machine , 1996, Neural Networks.

[12]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[13]  Shakir Mohamed,et al.  Implicit Reparameterization Gradients , 2018, NeurIPS.

[14]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[15]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[16]  Daniel Hernández-Lobato,et al.  Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[17]  Victor S. Lempitsky,et al.  Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[19]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[20]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[21]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[22]  M. Wand,et al.  Mean field variational Bayes for continuous sparse signal shrinkage: Pitfalls and remedies , 2014 .

[23]  Scott W. Linderman,et al.  Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms , 2016, AISTATS.

[24]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[25]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[26]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[27]  J. Segura,et al.  A new type of sharp bounds for ratios of modified Bessel functions , 2016, 1606.02008.

[28]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[29]  Finale Doshi-Velez,et al.  Structured Variational Learning of Bayesian Neural Networks with Horseshoe Priors , 2018, ICML.

[30]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[31]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[32]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[33]  Lawrence Carin,et al.  Learning Structured Weight Uncertainty in Bayesian Neural Networks , 2017, AISTATS.

[34]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[35]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[36]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[37]  Tim Palmer,et al.  Uncertainty in weather and climate prediction , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[38]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..