Quasi‐stationary Monte Carlo and the ScaLE algorithm

This paper introduces a class of Monte Carlo algorithms which are based on the simulation of a Markov process whose quasi‐stationary distribution coincides with a distribution of interest. This differs fundamentally from, say, current Markov chain Monte Carlo methods which simulate a Markov chain whose stationary distribution is the target. We show how to approximate distributions of interest by carefully combining sequential Monte Carlo methods with methodology for the exact simulation of diffusions. The methodology introduced here is particularly promising in that it is applicable to the same class of problems as gradient‐based Markov chain Monte Carlo algorithms but entirely circumvents the need to conduct Metropolis–Hastings type accept–reject steps while retaining exactness: the paper gives theoretical guarantees ensuring that the algorithm has the correct limiting target distribution. Furthermore, this methodology is highly amenable to ‘big data’ problems. By employing a modification to existing naive subsampling and control variate techniques it is possible to obtain an algorithm which is still exact but has sublinear iterative cost as a function of data size.

[1]  Ajay Jasra,et al.  Unbiased filtering of a class of partially observed diffusions , 2020, Advances in Applied Probability.

[2]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[3]  D. Villemonais,et al.  Stochastic approximation on noncompact measure spaces and application to measure-valued Pólya processes , 2020 .

[4]  Gareth Roberts,et al.  The Boomerang Sampler , 2020, ICML.

[5]  P. Jacob,et al.  Unbiased Markov chain Monte Carlo methods with couplings , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[6]  M. Benaim,et al.  Stochastic approximation of quasi-stationary distributions for diffusion processes in a bounded domain , 2019, Annales de l'Institut Henri Poincaré, Probabilités et Statistiques.

[7]  D. Rudolf,et al.  Perturbation bounds for Monte Carlo within Metropolis via restricted approximations , 2018, Stochastic processes and their applications.

[8]  Gareth O. Roberts,et al.  An approximation scheme for quasi-stationary distributions of killed diffusions , 2018, Stochastic Processes and their Applications.

[9]  Andi Q. Wang Theory of killing and regeneration in continuous-time Monte Carlo sampling , 2020 .

[10]  G. Roberts,et al.  Regeneration-enriched Markov processes with application to Monte Carlo , 2019, The Annals of Applied Probability.

[11]  Arnaud Doucet,et al.  Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets , 2019, ICML.

[12]  Hongsheng Dai,et al.  Monte Carlo fusion , 2019, J. Appl. Probab..

[13]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[14]  G. Roberts,et al.  Theoretical properties of quasi-stationary Monte Carlo methods , 2017, The Annals of Applied Probability.

[15]  Christopher Nemeth,et al.  Control variates for stochastic gradient MCMC , 2017, Statistics and Computing.

[16]  P. Fearnhead,et al.  The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data , 2016, The Annals of Statistics.

[17]  Eric Moulines,et al.  The promises and pitfalls of Stochastic Gradient Langevin Dynamics , 2018, NeurIPS.

[18]  J. Johndrow,et al.  Spectral gaps and error estimates for infinite-dimensional Metropolis–Hastings with non-Gaussian priors , 2018, The Annals of Applied Probability.

[19]  J. Pitman,et al.  A guide to Brownian motion and related stochastic processes , 2018, 1802.09679.

[20]  Michael I. Jordan,et al.  Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[21]  Paul Fearnhead,et al.  Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo , 2016, Statistical Science.

[22]  Bertrand Cloez,et al.  Stochastic approximation of quasi-stationary distributions on compact spaces and applications , 2016, The Annals of Applied Probability.

[23]  D. Rudolf,et al.  Perturbation theory for Markov chains via Wasserstein distance , 2015, Bernoulli.

[24]  G. Roberts,et al.  Unbiased Monte Carlo: Posterior estimation for intractable/infinite-dimensional models , 2014, Bernoulli.

[25]  Leonard Hasenclever,et al.  The True Cost of Stochastic Gradient Langevin Dynamics , 2017, 1706.02692.

[26]  James Zou,et al.  Quantifying the accuracy of approximate diffusions and Markov chains , 2016, AISTATS.

[27]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[28]  Nick Whiteley,et al.  Calculating Principal Eigen-Functions of Non-Negative Integral Kernels: Particle Approximations and Applications , 2012, Math. Oper. Res..

[29]  Alexander J. Smola,et al.  Variance Reduction in Stochastic Gradient Langevin Dynamics , 2016, NIPS.

[30]  J. Blanchet,et al.  Analysis of a stochastic approximation algorithm for computing quasi-stationary distributions , 2016, Advances in Applied Probability.

[31]  D. Dunson,et al.  Simple, scalable and accurate posterior interval estimation , 2016, 1605.04029.

[32]  Robert Kohn,et al.  Exact Subsampling MCMC , 2016 .

[33]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[34]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[35]  Yee Whye Teh,et al.  Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[36]  Lawrence Carin,et al.  On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[37]  A. Doucet,et al.  The Bouncy Particle Sampler: A Nonreversible Rejection-Free Markov Chain Monte Carlo Method , 2015, 1510.02451.

[38]  James Ridgway,et al.  Leave Pima Indians alone: binary regression as a benchmark for Bayesian computation , 2015, 1506.08640.

[39]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[40]  Murray Pollock On the exact simulation of (jump) diffusion bridges , 2015, 2015 Winter Simulation Conference (WSC).

[41]  Volkan Cevher,et al.  WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.

[42]  Bertrand Cloez,et al.  A stochastic approximation approach to quasi-stationary distributions on finite spaces , 2015 .

[43]  P. Jacob,et al.  On nonnegative unbiased estimators , 2013, 1309.6473.

[44]  Gareth O. Roberts,et al.  On the exact and ε-strong simulation of (jump) diffusions , 2013, 1302.6964.

[45]  P. Moral,et al.  Convergence properties of weighted particle islands with application to the double bootstrap algorithm , 2014, 1410.4231.

[46]  Peter W. Glynn,et al.  Exact estimation for Markov chain equilibrium expectations , 2014, Journal of Applied Probability.

[47]  David B. Dunson,et al.  Scalable and Robust Bayesian Inference via the Median Posterior , 2014, ICML.

[48]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[49]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[50]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[51]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[52]  Xiangyu Wang,et al.  Parallelizing MCMC via Weierstrass Sampler , 2013, 1312.4605.

[53]  Anthony Lee,et al.  On the role of interaction in sequential Monte Carlo algorithms , 2013, 1309.2918.

[54]  M. Pollock Some Monte Carlo methods for jump diffusions , 2013 .

[55]  Michael I. Jordan On statistics, computation and scalability , 2013, ArXiv.

[56]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[57]  G. Roberts,et al.  MCMC Methods for Functions: ModifyingOld Algorithms to Make Them Faster , 2012, 1202.0709.

[58]  Pierre Collet,et al.  Quasi-stationary distributions , 2011 .

[59]  Budget,et al.  UvA-DARE ( Digital Academic Repository ) Austerity in MCMC Land : Cutting the Metropolis-Hastings , 2013 .

[60]  P. Collet,et al.  Quasi-Stationary Distributions: Markov Chains, Diffusions and Dynamical Systems , 2012 .

[61]  Matthieu Jonckheere,et al.  Simulation of quasi-stationary distributions on countable spaces , 2012, 1206.6712.

[62]  Max Welling,et al.  Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012, ICML.

[63]  C. Fox,et al.  Coupled MCMC with a randomized acceptance probability , 2012, 1205.6857.

[64]  P. Moral,et al.  On adaptive resampling strategies for sequential Monte Carlo methods , 2012, 1203.0464.

[65]  D. Villemonais,et al.  Quasi-stationary distributions and population processes , 2011, 1112.4732.

[66]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[67]  Vivien Lecomte,et al.  Simulating Rare Events in Dynamical Processes , 2011, 1106.4929.

[68]  Andrew D. Martin,et al.  MCMCpack: Markov chain Monte Carlo in R , 2011 .

[69]  Denis Villemonais,et al.  Interacting Particle Systems and Yaglom Limit Approximation of Diffusions with Unbounded Drift , 2010, 1005.1530.

[70]  J. M. Sanz-Serna,et al.  Optimal tuning of the hybrid Monte Carlo algorithm , 2010, 1001.4460.

[71]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[72]  Luc Devroye,et al.  On exact simulation algorithms for some distributions related to Jacobi theta functions , 2009 .

[73]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[74]  E. Seneta Non-negative Matrices and Markov Chains , 2008 .

[75]  A. Doucet,et al.  A note on auxiliary particle filters , 2008 .

[76]  Alexandros Beskos,et al.  A Factorisation of Diffusion Measure and Finite Sample Path Constructions , 2008 .

[77]  Owen D. Jones,et al.  Simulation of Brownian motion at first-passage times , 2008, Math. Comput. Simul..

[78]  P. Moral,et al.  On Adaptive Resampling Procedures for Sequential Monte Carlo Methods , 2008 .

[79]  Xiongzhi Chen Brownian Motion and Stochastic Calculus , 2008 .

[80]  G. Roberts,et al.  Retrospective exact simulation of diffusion sample paths with applications , 2006 .

[81]  Mathias Rousset,et al.  On the Control of an Interacting Particle Estimation of Schrödinger Ground States , 2006, SIAM J. Math. Anal..

[82]  P. Fearnhead,et al.  Exact and computationally efficient likelihood‐based estimation for discretely observed diffusion processes (with discussion) , 2006 .

[83]  A. Y. Mitrophanov,et al.  Sensitivity and convergence of uniformly ergodic Markov chains , 2005 .

[84]  G. Roberts,et al.  Exact simulation of diffusions , 2005, math/0602523.

[85]  G. Roberts,et al.  SUBGEOMETRIC ERGODICITY OF STRONG MARKOV PROCESSES , 2005, math/0505260.

[86]  R. Dickman,et al.  How to simulate the quasistationary state. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[87]  Art B Owen,et al.  A quasi-Monte Carlo Metropolis algorithm. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[88]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[89]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[90]  N. Chopin Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference , 2004, math/0508594.

[91]  S. Evans,et al.  Quasistationary distributions for one-dimensional diffusions with killing , 2004, math/0406052.

[92]  P. Moral Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications , 2004 .

[93]  Pierre Del Moral,et al.  Feynman-Kac formulae , 2004 .

[94]  Pierre Del Moral,et al.  Particle approximations of Lyapunov exponents connected to Schrödinger operators and Feynman–Kac semigroups , 2003 .

[95]  R. G. Wilson,et al.  DEPARTMENT OF STATISTICS UNIVERSITY OF WARWICK , 2003 .

[96]  P. Moral,et al.  Branching and interacting particle systems. Approximations of Feynman-Kac formulae with applications to non-linear filtering , 2000 .

[97]  Peter March,et al.  A Fleming–Viot Particle Representation¶of the Dirichlet Laplacian , 2000 .

[98]  P. Fearnhead,et al.  Improved particle filter for nonlinear problems , 1999 .

[99]  Peter W. Glynn,et al.  Discretization Error in Simulation of One-Dimensional Reflecting Brownian Motion , 1995 .

[100]  Jun S. Liu,et al.  Sequential Imputations and Bayesian Missing Data Problems , 1994 .

[101]  M. Yor,et al.  Continuous martingales and Brownian motion , 1990 .

[102]  D. Aldous,et al.  Two Applications of Urn Processes The Fringe Analysis of Search Trees and The Simulation of Quasi-Stationary Distributions of Markov Chains , 1988, Probability in the Engineering and Informational Sciences.

[103]  C. N. Morris,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[104]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[105]  Richard A. Johnson Asymptotic Expansions Associated with Posterior Distributions , 1970 .

[106]  Z. Ciesielski,et al.  First passage times and sojourn times for Brownian motion in space and the exact Hausdorff measure of the sample path , 1962 .