The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data

Standard MCMC methods can scale poorly to big data settings due to the need to evaluate the likelihood at each iteration. There have been a number of approximate MCMC algorithms that use sub-sampling ideas to reduce this computational burden, but with the drawback that these algorithms no longer target the true posterior distribution. We introduce a new family of Monte Carlo methods based upon a multi-dimensional version of the Zig-Zag process of (Bierkens, Roberts, 2017), a continuous time piecewise deterministic Markov process. While traditional MCMC methods are reversible by construction (a property which is known to inhibit rapid convergence) the Zig-Zag process offers a flexible non-reversible alternative which we observe to often have favourable convergence properties. We show how the Zig-Zag process can be simulated without discretisation error, and give conditions for the process to be ergodic. Most importantly, we introduce a sub-sampling version of the Zig-Zag process that is an example of an {\em exact approximate scheme}, i.e. the resulting approximate process still has the posterior as its stationary distribution. Furthermore, if we use a control-variate idea to reduce the variance of our unbiased estimator, then the Zig-Zag process can be super-efficient: after an initial pre-processing step, essentially independent samples from the posterior distribution are obtained at a computational cost which does not depend on the size of the data.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  Richard A. Johnson Asymptotic Expansions Associated with Posterior Distributions , 1970 .

[4]  M. Kac A stochastic model related to the telegrapher's equation , 1974 .

[5]  G. Shedler,et al.  Simulation of Nonhomogeneous Poisson Processes by Thinning , 1979 .

[6]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[7]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[8]  C. Hwang,et al.  Accelerating Gaussian Diffusions , 1993 .

[9]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[10]  Radford M. Neal,et al.  Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation , 1995, Learning in Graphical Models.

[11]  Michael A. Gibson,et al.  Efficient Exact Stochastic Simulation of Chemical Systems with Many Species and Many Channels , 2000 .

[12]  Radford M. Neal,et al.  ANALYSIS OF A NONREVERSIBLE MARKOV CHAIN SAMPLER , 2000 .

[13]  David F Anderson,et al.  A modified next reaction method for simulating chemical systems with time dependent propensities and delays. , 2007, The Journal of chemical physics.

[14]  Michael Chertkov,et al.  Irreversible Monte Carlo Algorithms for Efficient Sampling , 2008, ArXiv.

[15]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[16]  Yi Sun,et al.  Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices , 2010, NIPS.

[17]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[18]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[19]  F. Malrieu,et al.  Quantitative Estimates for the Long-Time Behavior of an Ergodic Variant of the Telegraph Process , 2010, Advances in Applied Probability.

[20]  E A J F Peters,et al.  Rejection-free Monte Carlo sampling for general potentials. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  C. Hwang,et al.  Accelerating reversible Markov chains , 2013 .

[22]  Pierre Monmarch'e Hypocoercive relaxation to equilibrium for some kinetic models via a third order differential inequality , 2013, 1306.4548.

[23]  Xiangyu Wang,et al.  Parallelizing MCMC via Weierstrass Sampler , 2013, 1312.4605.

[24]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[25]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[26]  R. Handel Probability in High Dimension , 2014 .

[27]  Florent Malrieu,et al.  Long time behavior of telegraph processes under convex potentials , 2015, 1507.03503.

[28]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[29]  G. Roberts,et al.  A piecewise deterministic scaling limit of Lifted Metropolis-Hastings in the Curie-Weiss model , 2015, 1509.00302.

[30]  K. Spiliopoulos,et al.  Irreversible Langevin samplers and variance reduction: a large deviations approach , 2014, 1404.0105.

[31]  K. Zygalakis,et al.  (Non-) asymptotic properties of Stochastic Gradient Langevin Dynamics , 2015, 1501.00438.

[32]  Volkan Cevher,et al.  WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.

[33]  A. Doucet,et al.  The Bouncy Particle Sampler: A Nonreversible Rejection-Free Markov Chain Monte Carlo Method , 2015, 1510.02451.

[34]  P. Jacob,et al.  On nonnegative unbiased estimators , 2013, 1309.6473.

[35]  D. Dunson,et al.  Simple, scalable and accurate posterior interval estimation , 2016, 1605.04029.

[36]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[37]  P. Fearnhead,et al.  The Scalable Langevin Exact Algorithm : Bayesian Inference for Big Data , 2016 .

[38]  Alexander J. Smola,et al.  Variance Reduction in Stochastic Gradient Langevin Dynamics , 2016, NIPS.

[39]  G. Pavliotis,et al.  Variance Reduction Using Nonreversible Langevin Samplers , 2015, Journal of statistical physics.

[40]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[41]  Yee Whye Teh,et al.  Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[42]  Joris Bierkens,et al.  Non-reversible Metropolis-Hastings , 2014, Stat. Comput..

[43]  James Zou,et al.  Quantifying the accuracy of approximate diffusions and Markov chains , 2016, AISTATS.

[44]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[45]  David E. Carlson,et al.  Stochastic Bouncy Particle Sampler , 2016, ICML.

[46]  David B. Dunson,et al.  Robust and Scalable Bayes via a Median of Subset Posterior Measures , 2014, J. Mach. Learn. Res..

[47]  R. Kohn,et al.  Speeding Up MCMC by Efficient Data Subsampling , 2014, Journal of the American Statistical Association.

[48]  Paul Fearnhead,et al.  Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo , 2016, Statistical Science.

[49]  A. Doucet,et al.  Exponential ergodicity of the bouncy particle sampler , 2017, The Annals of Statistics.

[50]  G. Roberts,et al.  Ergodicity of the zigzag process , 2017, The Annals of Applied Probability.