论文信息 - Alpha-Divergences in Variational Dropout - 字舞流文

Alpha-Divergences in Variational Dropout

We investigate the use of alternative divergences to Kullback-Leibler (KL) in variational inference(VI), based on the Variational Dropout \cite{kingma2015}. Stochastic gradient variational Bayes (SGVB) \cite{aevb} is a general framework for estimating the evidence lower bound (ELBO) in Variational Bayes. In this work, we extend the SGVB estimator with using Alpha-Divergences, which are alternative to divergences to VI' KL objective. The Gaussian dropout can be seen as a local reparametrization trick of the SGVB objective. We extend the Variational Dropout to use alpha divergences for variational inference. Our results compare $\alpha$-divergence variational dropout with standard variational dropout with correlated and uncorrelated weight noise. We show that the $\alpha$-divergence with $\alpha \rightarrow 1$ (or KL divergence) is still a good measure for use in variational inference, in spite of the efficient use of Alpha-divergences for Dropout VI \cite{Li17}. $\alpha \rightarrow 1$ can yield the lowest training error, and optimizes a good lower bound for the evidence lower bound (ELBO) among all values of the parameter $\alpha \in [0,\infty)$.

Bogdan Mazoure | Riashat Islam | Riashat Islam | Bogdan Mazoure

[1] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2] Alexandre Lacoste,et al. Bayesian Hypernetworks , 2017, ArXiv.

[3] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.

[4] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[5] Ryan P. Adams,et al. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[6] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[7] Christopher D. Manning,et al. Fast dropout training , 2013, ICML.

[8] Yarin Gal,et al. Dropout Inference in Bayesian Neural Networks with Alpha-divergences , 2017, ICML.

[9] Jianhua Lin,et al. Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[10] Peter Harremoës,et al. Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[11] Richard E. Turner,et al. Rényi Divergence Variational Inference , 2016, NIPS.

[12] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[13] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[14] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[15] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[16] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17] Zoubin Ghahramani,et al. Variational Gaussian Dropout is not Bayesian , 2017, 1711.02989.

[18] Richard E. Turner,et al. Black-box α-divergence minimization , 2016, ICML 2016.

[19] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[20] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[21] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[22] Zoubin Ghahramani,et al. Deep Bayesian Active Learning with Image Data , 2017, ICML.