论文信息 - Radial and Directional Posteriors for Bayesian Deep Learning

Radial and Directional Posteriors for Bayesian Deep Learning

We propose a new variational family for Bayesian neural networks. We decompose the variational posterior into two components, where the radial component captures the strength of each neuron in terms of its magnitude; while the directional component captures the statistical dependencies among the weight parameters. The dependencies learned via the directional density provide better modeling performance compared to the widely-used Gaussian mean-field-type variational family. In addition, the strength of input and output neurons learned via our posterior provides a structured way to compress neural networks. Indeed, experiments show that our variational family improves predictive performance and yields compressed networks simultaneously.

Mijung Park | Changyong Oh | Kamil Adamczewski

[1] Guodong Zhang,et al. Functional Variational Bayesian Neural Networks , 2019, ICLR.

[2] Dmitry P. Vetrov,et al. Structured Bayesian Pruning via Log-Normal Multiplicative Noise , 2017, NIPS.

[3] Nicola De Cao,et al. Hyperspherical Variational Auto-Encoders , 2018, UAI 2018.

[4] Guodong Zhang,et al. Noisy Natural Gradient as Variational Inference , 2017, ICML.

[5] Jimmy J. Lin,et al. FLOPs as a Direct Optimization Objective for Learning Sparse Neural Networks , 2018, ArXiv.

[6] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .

[7] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.

[8] David Barber,et al. A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[9] Alex Graves,et al. Practical Variational Inference for Neural Networks , 2011, NIPS.

[10] Hao Zhou,et al. Less Is More: Towards Compact CNNs , 2016, ECCV.

[11] Geoffrey E. Hinton,et al. Varieties of Helmholtz Machine , 1996, Neural Networks.

[12] Max Welling,et al. Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[13] Shakir Mohamed,et al. Implicit Reparameterization Gradients , 2018, NeurIPS.

[14] Inderjit S. Dhillon,et al. Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[15] Max Welling,et al. Bayesian Compression for Deep Learning , 2017, NIPS.

[16] Daniel Hernández-Lobato,et al. Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[17] Victor S. Lempitsky,et al. Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Max Welling,et al. Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[19] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[20] David Mackay,et al. Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[21] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[22] M. Wand,et al. Mean field variational Bayes for continuous sparse signal shrinkage: Pitfalls and remedies , 2014 .

[23] Scott W. Linderman,et al. Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms , 2016, AISTATS.

[24] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[25] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[26] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[27] J. Segura,et al. A new type of sharp bounds for ratios of modified Bessel functions , 2016, 1606.02008.

[28] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.

[29] Finale Doshi-Velez,et al. Structured Variational Learning of Bayesian Neural Networks with Horseshoe Priors , 2018, ICML.

[30] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[31] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[32] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[33] Lawrence Carin,et al. Learning Structured Weight Uncertainty in Bayesian Neural Networks , 2017, AISTATS.

[34] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[35] Ryan P. Adams,et al. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[36] R. Venkatesh Babu,et al. Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[37] Tim Palmer,et al. Uncertainty in weather and climate prediction , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[38] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..