论文信息 - Improving Bayesian Inference in Deep Neural Networks with Variational Structured Dropout

Improving Bayesian Inference in Deep Neural Networks with Variational Structured Dropout

Approximate inference in deep Bayesian networks exhibits a dilemma of how to yield high fidelity posterior approximations while maintaining computational efficiency and scalability. We tackle this challenge by introducing a new variational structured approximation inspired by the interpretation of Dropout training as approximate inference in Bayesian probabilistic models. Concretely, we focus on restrictions of the factorized structure of Dropout posterior which is inflexible to capture rich correlations among weight parameters of the true posterior, and we then propose a novel method called Variational Structured Dropout (VSD) to overcome this limitation. VSD employs an orthogonal transformation to learn a structured representation on the variational Dropout noise and consequently induces statistical dependencies in the approximate posterior. We further gain an expressive Bayesian modeling for VSD via proposing a hierarchical Dropout procedure that corresponds to the joint inference in a Bayesian network. Moreover, we can scale up VSD to modern deep convolutional networks in a direct way with low computational cost. Finally, we conduct extensive experiments on standard benchmarks to demonstrate the effectiveness of VSD over state-of-the-art methods on both predictive accuracy and uncertainty estimation.

[1] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[2] Zoubin Ghahramani,et al. Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[3] Max Welling,et al. Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[4] Jakub M. Tomczak,et al. Improving Variational Auto-Encoders using convex combination linear Inverse Autoregressive Flow , 2017, 1706.02326.

[5] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6] Lei Zhang,et al. Variational Bayesian Dropout With a Hierarchical Prior , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Ryan P. Adams,et al. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[8] Jasper Snoek,et al. The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks , 2020, ICML.

[9] Guodong Zhang,et al. Noisy Natural Gradient as Variational Inference , 2017, ICML.

[10] Yarin Gal,et al. Uncertainty in Deep Learning , 2016 .

[11] Raman Arora,et al. On the Implicit Bias of Dropout , 2018, ICML.

[12] Aaron Mishkin,et al. SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient , 2018, NeurIPS.

[13] Raman Arora,et al. On Convergence and Generalization of Dropout Training , 2020, NeurIPS.

[14] Christian P. Robert,et al. Large-scale inference , 2010 .

[15] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[16] A. Rukhin. Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[17] Max Welling,et al. Sylvester Normalizing Flows for Variational Inference , 2018, UAI.

[18] David Barber,et al. A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.