A Sequential Marginal Likelihood Approximation Using Stochastic Gradients

Existing algorithms like nested sampling and annealed importance sampling are able to produce accurate estimates of the marginal likelihood of a model, but tend to scale poorly to large data sets. This is because these algorithms need to recalculate the log-likelihood for each iteration by summing over the whole data set. Efficient scaling to large data sets requires that algorithms only visit small subsets (mini-batches) of data on each iteration. To this end, we estimate the marginal likelihood via a sequential decomposition into a product of predictive distributions p ( y n | y < n ) . Predictive distributions can be approximated efficiently through Bayesian updating using stochastic gradient Hamiltonian Monte Carlo, which approximates likelihood gradients using mini-batches. Since each data point typically contains little information compared to the whole data set, the convergence to each successive posterior only requires a short burn-in phase. This approach can be viewed as a special case of sequential Monte Carlo (SMC) with a single particle, but differs from typical SMC methods in that it uses stochastic gradients. We illustrate how this approach scales favourably to large data sets with some simple models.

[1]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[2]  Ryan P. Adams,et al.  Sandwiching the marginal likelihood using bidirectional Monte Carlo , 2015, ArXiv.

[3]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[4]  Fredrik Lindsten,et al.  Elements of Sequential Monte Carlo , 2019, Found. Trends Mach. Learn..

[5]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[6]  Andrew Gordon Wilson,et al.  Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning , 2019, ICLR.

[7]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[8]  J. Skilling Nested sampling for general Bayesian computation , 2006 .

[9]  J. Skilling Bayesian computation in big spaces-nested sampling and Galilean Monte Carlo , 2012 .

[10]  David Wynne Williams,et al.  STOCHASTIC CALCULUS A PRACTICAL INTRODUCTION (Probability and Stochastics Series 3) By Richard Durrett: 341 pp., US$59.95 (outside US, $72.00), ISBN 0 8493 8071 5 (CRC Press, 1996) , 1998 .

[11]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[12]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[13]  K. Vahala Handbook of stochastic methods for physics, chemistry and the natural sciences , 1986, IEEE Journal of Quantum Electronics.

[14]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[15]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[16]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .