论文信息 - Efficient Low Rank Gaussian Variational Inference for Neural Networks - 字舞流文

Efficient Low Rank Gaussian Variational Inference for Neural Networks

Bayesian neural networks are enjoying a renaissance driven in part by recent advances in variational inference (VI). The most common form of VI employs a fully factorized or mean-field distribution, but this is known to suffer from several pathologies, especially as we expect posterior distributions with highly correlated parameters. Current algorithms that capture these correlations with a Gaussian approximating family are difficult to scale to large models due to computational costs and high variance of gradient updates. By using a new form of the reparametrization trick, we derive a computationally efficient algorithm for performing VI with a Gaussian family with a low-rank plus diagonal covariance structure. We scale to deep feed-forward and convolutional architectures. We find that adding low-rank terms to parametrized diagonal covariance does not improve predictive performance except on small networks, but low-rank terms added to a constant diagonal covariance improves performance on small and large-scale network architectures.

Richard E. Turner | Siddharth Swaroop | Marcin Tomczak | Marcin Tomczak | Siddharth Swaroop | S. Swaroop

[1] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[2] Charles M. Bishop,et al. Ensemble learning in Bayesian neural networks , 1998 .

[3] Yoshua Bengio,et al. Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.

[4] Michael E. Tipping,et al. Probabilistic Principal Component Analysis , 1999 .

[5] Neil D. Lawrence,et al. Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[6] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[7] Matthias W. Seeger,et al. Gaussian Covariance and Scalable Variational Inference , 2010, ICML.

[8] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[9] Alex Graves,et al. Practical Variational Inference for Neural Networks , 2011, NIPS.

[10] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[13] Miguel Lázaro-Gredilla,et al. Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[14] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[15] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[16] Ryan P. Adams,et al. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.

[19] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Max Welling,et al. Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[22] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[23] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[24] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[25] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[26] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[27] Ryan P. Adams,et al. Variational Boosting: Iteratively Refining Posterior Approximations , 2016, ICML.

[28] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[29] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[30] Alex Kendall,et al. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[31] D. Nott,et al. Gaussian Variational Approximation With a Factor Covariance Structure , 2017, Journal of Computational and Graphical Statistics.

[32] Alex Lamb,et al. Deep Learning for Classical Japanese Literature , 2018, ArXiv.

[33] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[34] Dustin Tran,et al. Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches , 2018, ICLR.

[35] Richard E. Turner,et al. Overpruning in Variational Bayesian Neural Networks , 2018, 1801.06230.

[36] Aaron Mishkin,et al. SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient , 2018, NeurIPS.

[37] David J. Nott,et al. Gaussian variational approximation with sparse precision matrices , 2016, Statistics and Computing.

[38] Richard E. Turner,et al. Pathologies of Factorised Gaussian and MC Dropout Posteriors in Bayesian Neural Networks , 2019, ArXiv.

[39] Richard E. Turner,et al. Practical Deep Learning with Bayesian Principles , 2019, Neural Information Processing Systems.

[40] Richard E. Turner,et al. 'In-Between' Uncertainty in Bayesian Neural Networks , 2019, ArXiv.

[41] Kamil Adamczewski,et al. Radial and Directional Posteriors for Bayesian Neural Networks , 2019, ArXiv.

[42] Jasper Snoek,et al. The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks , 2020, ICML.

[43] Sebastian Nowozin,et al. Independent Subspace Analysis for Unsupervised Learning of Disentangled Representations , 2019, AISTATS.

[44] Yarin Gal,et al. Try Depth Instead of Weight Correlations: Mean-field is a Less Restrictive Assumption for Deeper Networks , 2020, ArXiv.