The Scalable Langevin Exact Algorithm : Bayesian Inference for Big Data

This paper introduces a class of Monte Carlo algorithms which are based upon simulating a Markov process whose quasi-stationary distribution coincides with the distribution of interest. This differs fundamentally from, say, current Markov chain Monte Carlo in which we simulate a Markov chain whose stationary distribution is the target. We show how to approximate distributions of interest by carefully combining sequential Monte Carlo methods with methodology for the exact simulation of diffusions. Our methodology is particularly promising in that it is applicable to the same class of problems as gradient based Markov chain Monte Carlo algorithms but entirely circumvents the need to conduct Metropolis-Hastings type accept/reject steps whilst retaining exactness: we have theoretical guarantees that we recover the correct limiting target distribution. Furthermore, this methodology is highly amenable to big data problems. By employing a modification to existing naive subsampling techniques we can obtain an algorithm which is still exact but has sub-linear iterative cost as a function of data size.

[1]  Robert Kohn,et al.  The Block-Poisson Estimator for Optimally Tuned Exact Subsampling MCMC , 2016, J. Comput. Graph. Stat..

[2]  P. Fearnhead,et al.  The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data , 2016, The Annals of Statistics.

[3]  Michael I. Jordan,et al.  Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[4]  Leonard Hasenclever,et al.  The True Cost of Stochastic Gradient Langevin Dynamics , 2017, 1706.02692.

[5]  James Zou,et al.  Quantifying the accuracy of approximate diffusions and Markov chains , 2016, AISTATS.

[6]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[7]  Nick Whiteley,et al.  Calculating Principal Eigen-Functions of Non-Negative Integral Kernels: Particle Approximations and Applications , 2012, Math. Oper. Res..

[8]  Alexander J. Smola,et al.  Variance Reduction in Stochastic Gradient Langevin Dynamics , 2016, NIPS.

[9]  J. Blanchet,et al.  Analysis of a stochastic approximation algorithm for computing quasi-stationary distributions , 2016, Advances in Applied Probability.

[10]  D. Dunson,et al.  Simple, scalable and accurate posterior interval estimation , 2016, 1605.04029.

[11]  Robert Kohn,et al.  Exact Subsampling MCMC , 2016 .

[12]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[13]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[14]  Yee Whye Teh,et al.  Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[15]  Lawrence Carin,et al.  On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[16]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[17]  Murray Pollock On the exact simulation of (jump) diffusion bridges , 2015, 2015 Winter Simulation Conference (WSC).

[18]  Volkan Cevher,et al.  WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.

[19]  P. Jacob,et al.  On nonnegative unbiased estimators , 2013, 1309.6473.

[20]  Gareth O. Roberts,et al.  On the exact and ε-strong simulation of (jump) diffusions , 2013, 1302.6964.

[21]  P. Moral,et al.  Convergence properties of weighted particle islands with application to the double bootstrap algorithm , 2014, 1410.4231.

[22]  David B. Dunson,et al.  Scalable and Robust Bayesian Inference via the Median Posterior , 2014, ICML.

[23]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[24]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[25]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[26]  Jose H. Blanchet,et al.  Theoretical analysis of a Stochastic Approximation approach for computing Quasi-Stationary distributions of general state space Markov chains , 2014 .

[27]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[28]  Xiangyu Wang,et al.  Parallelizing MCMC via Weierstrass Sampler , 2013, 1312.4605.

[29]  M. Pollock Some Monte Carlo methods for jump diffusions , 2013 .

[30]  Michael I. Jordan On statistics, computation and scalability , 2013, ArXiv.

[31]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[32]  Anthony Lee,et al.  Feynman-Kac Particle Integration with Geometric Interacting Jumps , 2012, 1211.7191.

[33]  P. Collet,et al.  Quasi-Stationary Distributions: Markov Chains, Diffusions and Dynamical Systems , 2012 .

[34]  Matthieu Jonckheere,et al.  Simulation of quasi-stationary distributions on countable spaces , 2012, 1206.6712.

[35]  C. Fox,et al.  Coupled MCMC with a randomized acceptance probability , 2012, 1205.6857.

[36]  P. Moral,et al.  On adaptive resampling strategies for sequential Monte Carlo methods , 2012, 1203.0464.

[37]  Ahn,et al.  Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[38]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[39]  Vivien Lecomte,et al.  Simulating Rare Events in Dynamical Processes , 2011, 1106.4929.

[40]  Andrew D. Martin,et al.  MCMCpack: Markov chain Monte Carlo in R , 2011 .

[41]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[42]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[43]  Luc Devroye,et al.  On exact simulation algorithms for some distributions related to Jacobi theta functions , 2009 .

[44]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[45]  Alexandros Beskos,et al.  A Factorisation of Diffusion Measure and Finite Sample Path Constructions , 2008 .

[46]  Owen D. Jones,et al.  Simulation of Brownian motion at first-passage times , 2008, Math. Comput. Simul..

[47]  Xiongzhi Chen Brownian Motion and Stochastic Calculus , 2008 .

[48]  P. Moral,et al.  On Adaptive Resampling Procedures for Sequential Monte Carlo Methods , 2008 .

[49]  A. Doucet,et al.  A Tutorial on Particle Filtering and Smoothing: Fifteen years later , 2008 .

[50]  G. Roberts,et al.  Retrospective exact simulation of diffusion sample paths with applications , 2006 .

[51]  Mathias Rousset,et al.  On the Control of an Interacting Particle Estimation of Schrödinger Ground States , 2006, SIAM J. Math. Anal..

[52]  G. Roberts,et al.  Exact simulation of diffusions , 2005, math/0602523.

[53]  G. Roberts,et al.  SUBGEOMETRIC ERGODICITY OF STRONG MARKOV PROCESSES , 2005, math/0505260.

[54]  R. Dickman,et al.  How to simulate the quasistationary state. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[55]  N. Chopin Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference , 2004, math/0508594.

[56]  S. Evans,et al.  Quasistationary distributions for one-dimensional diffusions with killing , 2004, math/0406052.

[57]  P. Moral Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications , 2004 .

[58]  Pierre Del Moral,et al.  Particle approximations of Lyapunov exponents connected to Schrödinger operators and Feynman–Kac semigroups , 2003 .

[59]  P. Fearnhead,et al.  Improved particle filter for nonlinear problems , 1999 .

[60]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[61]  Peter W. Glynn,et al.  Discretization Error in Simulation of One-Dimensional Reflecting Brownian Motion , 1995 .

[62]  Jun S. Liu,et al.  Sequential Imputations and Bayesian Missing Data Problems , 1994 .

[63]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[64]  C. N. Morris,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[65]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[66]  Gérard Letac,et al.  On Building Random Variables of a Given Distribution , 1975 .

[67]  Richard A. Johnson Asymptotic Expansions Associated with Posterior Distributions , 1970 .

[68]  H. Milicer,et al.  Age at menarche in Warsaw girls in 1965. , 1966, Human biology.

[69]  Z. Ciesielski,et al.  First passage times and sojourn times for Brownian motion in space and the exact Hausdorff measure of the sample path , 1962 .