论文信息 - Variational Gaussian Dropout is not Bayesian

Variational Gaussian Dropout is not Bayesian

Gaussian multiplicative noise is commonly used as a stochastic regularisation technique in training of deterministic neural networks. A recent paper reinterpreted the technique as a specific algorithm for approximate inference in Bayesian neural networks; several extensions ensued. We show that the log-uniform prior used in all the above publications does not generally induce a proper posterior, and thus Bayesian inference in such models is ill-posed. Independent of the log-uniform prior, the correlated weight noise approximation has further issues leading to either infinite objective or high risk of overfitting. The above implies that the reported sparsity of obtained solutions cannot be explained by Bayesian or the related minimum description length arguments. We thus study the objective from a non-Bayesian perspective, provide its previously unknown analytical form which allows exact gradient evaluation, and show that the later proposed additive reparametrisation introduces minima not present in the original multiplicative parametrisation. Implications and future research directions are discussed.

Zoubin Ghahramani | A. G. D. G. Matthews | Jiri Hron | A. G. Matthews

[1] F. Harris. Tables of the exponential integral $Ei(x)$ , 1957 .

[2] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[3] Tong Zhang,et al. Regularized Winnow Methods , 2000, NIPS.

[4] Tong Zhang. From ɛ-entropy to KL-entropy: Analysis of minimum information complexity density estimation , 2006, math/0702653.

[5] Hari M. Srivastava,et al. A note on harmonic numbers, umbral calculus and generating functions , 2008, Appl. Math. Lett..

[6] V. Koltchinskii. Sparse recovery in convex hulls via entropy penalization , 2009, 0905.2078.

[7] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[9] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[10] Max Welling,et al. Bayesian Compression for Deep Learning , 2017, NIPS.

[11] Dmitry P. Vetrov,et al. Structured Bayesian Pruning via Log-Normal Multiplicative Noise , 2017, NIPS.