论文信息 - Black-Box Alpha Divergence Minimization - 字舞流文

Black-Box Alpha Divergence Minimization

Black-box alpha (BB-$\alpha$) is a new approximate inference method based on the minimization of $\alpha$-divergences. BB-$\alpha$ scales to large datasets because it can be implemented using stochastic gradient descent. BB-$\alpha$ can be applied to complex probabilistic models with little effort since it only requires as input the likelihood function and its gradients. These gradients can be easily obtained using automatic differentiation. By changing the divergence parameter $\alpha$, the method is able to interpolate between variational Bayes (VB) ($\alpha \rightarrow 0$) and an algorithm similar to expectation propagation (EP) ($\alpha = 1$). Experiments on probit regression and neural network regression and classification problems show that BB-$\alpha$ with non-standard settings of $\alpha$, such as $\alpha = 0.5$, usually produces better predictions than with $\alpha \rightarrow 0$ (VB) or $\alpha = 1$ (EP).

Daniel Hernández-Lobato | Richard E. Turner | Thang D. Bui | José Miguel Hernández-Lobato | Yingzhen Li | Mark Rowland | Mark Rowland | Yingzhen Li | D. Hernández-Lobato | T. Bui | M. Rowland

[1] J. Cunningham,et al. Approximate Gaussian Integration using Expectation Propagation , 2011 .

[2] Huaiyu Zhu,et al. Information geometric measurements of generalisation , 1995 .

[3] J. Cunningham,et al. Gaussian Probabilities and Expectation Propagation , 2011, 1111.6832.

[4] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[5] Charles M. Bishop,et al. Variational Message Passing , 2005, J. Mach. Learn. Res..

[6] Thomas P. Minka,et al. Divergence measures and message passing , 2005 .

[7] Sean Gerrish,et al. Black Box Variational Inference , 2013, AISTATS.

[8] Edward O. Pyzer-Knapp,et al. Learning from the Harvard Clean Energy Project: The Use of Neural Networks to Accelerate Materials Discovery , 2015 .

[9] Richard E. Turner,et al. Stochastic Expectation Propagation , 2015, NIPS.

[10] Richard E. Turner,et al. Probabilistic amplitude and frequency demodulation , 2011, NIPS.

[11] Tom Minka,et al. A family of algorithms for approximate Bayesian inference , 2001 .

[12] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.

[13] Yee Whye Teh,et al. Distributed Bayesian Posterior Sampling via Moment Sharing , 2014, NIPS.

[14] Yee Whye Teh,et al. Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server , 2015, J. Mach. Learn. Res..

[15] Alán Aspuru-Guzik,et al. Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry – the Harvard Clean Energy Project , 2014 .

[16] Richard E. Turner,et al. Two problems with variational expectation maximisation for time-series models , 2011 .

[17] Daniel Hernández-Lobato,et al. Scalable Gaussian Process Classification via Expectation Propagation , 2015, AISTATS.

[18] Tom Minka,et al. Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[19] Ole Winther,et al. Expectation Consistent Approximate Inference , 2005, J. Mach. Learn. Res..

[20] Nicolas Chopin,et al. ABC-EP: Expectation Propagation for Likelihoodfree Bayesian Computation , 2011, ICML.

[21] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[22] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24] Tim Salimans,et al. Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[25] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[26] T. Heskes,et al. Expectation propagation for approximate inference in dynamic bayesian networks , 2002, UAI 2002.

[27] M. Seeger. Expectation Propagation for Exponential Families , 2005 .

[28] Shun-ichi Amari,et al. Differential-geometrical methods in statistics , 1985 .

[29] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .

[30] Guillaume P. Dehaene,et al. Expectation propagation in the large data limit , 2015, 1503.08060.