论文信息 - A Sequential Marginal Likelihood Approximation Using Stochastic Gradients

A Sequential Marginal Likelihood Approximation Using Stochastic Gradients

Existing algorithms like nested sampling and annealed importance sampling are able to produce accurate estimates of the marginal likelihood of a model, but tend to scale poorly to large data sets. This is because these algorithms need to recalculate the log-likelihood for each iteration by summing over the whole data set. Efficient scaling to large data sets requires that algorithms only visit small subsets (mini-batches) of data on each iteration. To this end, we estimate the marginal likelihood via a sequential decomposition into a product of predictive distributions p ( y n | y < n ) . Predictive distributions can be approximated efficiently through Bayesian updating using stochastic gradient Hamiltonian Monte Carlo, which approximates likelihood gradients using mini-batches. Since each data point typically contains little information compared to the whole data set, the convergence to each successive posterior only requires a short burn-in phase. This approach can be viewed as a special case of sequential Monte Carlo (SMC) with a single particle, but differs from typical SMC methods in that it uses stochastic gradients. We illustrate how this approach scales favourably to large data sets with some simple models.

Steve Kroon | Hans C. Eggers | H. Eggers | Steve Kroon | Scott A. Cameron

[1] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[2] Ryan P. Adams,et al. Sandwiching the marginal likelihood using bidirectional Monte Carlo , 2015, ArXiv.

[3] Aaron Klein,et al. Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[4] Fredrik Lindsten,et al. Elements of Sequential Monte Carlo , 2019, Found. Trends Mach. Learn..

[5] Ruslan Salakhutdinov,et al. Evaluation methods for topic models , 2009, ICML '09.

[6] Andrew Gordon Wilson,et al. Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning , 2019, ICLR.

[7] Tianqi Chen,et al. A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[8] J. Skilling. Nested sampling for general Bayesian computation , 2006 .

[9] J. Skilling. Bayesian computation in big spaces-nested sampling and Galilean Monte Carlo , 2012 .

[10] David Wynne Williams,et al. STOCHASTIC CALCULUS A PRACTICAL INTRODUCTION (Probability and Stochastics Series 3) By Richard Durrett: 341 pp., US$59.95 (outside US, $72.00), ISBN 0 8493 8071 5 (CRC Press, 1996) , 1998 .

[11] David Barber,et al. Bayesian reasoning and machine learning , 2012 .

[12] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[13] K. Vahala. Handbook of stochastic methods for physics, chemistry and the natural sciences , 1986, IEEE Journal of Quantum Electronics.

[14] N. Gordon,et al. Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[15] Radford M. Neal. Annealed importance sampling , 1998, Stat. Comput..

[16] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .