Optimal thinning of MCMC output

The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced. Typically a number of the initial states are attributed to "burn in" and removed, whilst the remainder of the chain is "thinned" if compression is also required. In this paper we consider the problem of retrospectively selecting a subset of states, of fixed cardinality, from the sample path such that the approximation provided by their empirical distribution is close to optimal. A novel method is proposed, based on greedy minimisation of a kernel Stein discrepancy, that is suitable for problems where heavy compression is required. Theoretical results guarantee consistency of the method and its effectiveness is demonstrated in the challenging context of parameter inference for ordinary differential equations. Software is available in the Stein Thinning package in both Python and MATLAB.

[1]  V. Roshan Joseph,et al.  Support points , 2016, The Annals of Statistics.

[2]  Gernot Plank,et al.  Stochastic spontaneous calcium release events trigger premature ventricular complexes by overcoming electrotonic load. , 2015, Cardiovascular research.

[3]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[4]  Heikki Haario,et al.  Adaptive proposal distribution for random walk Metropolis algorithm , 1999, Comput. Stat..

[5]  Pierre Alquier,et al.  Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels , 2014, Statistics and Computing.

[6]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[7]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[8]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[9]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[10]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[11]  M. Girolami,et al.  A Riemannian-Stein Kernel method , 2018 .

[12]  Gernot Plank,et al.  Simulating ventricular systolic motion in a four-chamber heart model with spatially varying robin boundary conditions to model the effect of the pericardium , 2020, Journal of biomechanics.

[13]  Lester Mackey,et al.  Random Feature Stein Discrepancies , 2018, NeurIPS.

[14]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[15]  N. Trayanova,et al.  Computational models in cardiology , 2018, Nature Reviews Cardiology.

[16]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[17]  Art B. Owen,et al.  Statistically Efficient Thinning of a Markov Chain Sampler , 2015, ArXiv.

[18]  Rui Tuo,et al.  Deterministic Sampling of Expensive Posteriors Using Minimum Energy Designs , 2017, Technometrics.

[19]  Lawrence Mitchell,et al.  Simulating Human Cardiac Electrophysiology on Clinical Time-Scales , 2011, Front. Physio..

[20]  S. Meyn,et al.  Computable Bounds for Geometric Convergence Rates of Markov Chains , 1994 .

[21]  James M. Flegal,et al.  Lugsail lag windows and their application to MCMC , 2018 .

[22]  Liam Hodgkinson,et al.  The reproducing Stein kernel approach for post-hoc corrected sampling , 2020, 2001.09266.

[23]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[24]  A. Duncan,et al.  On the geometry of Stein variational gradient descent , 2019, ArXiv.

[25]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[26]  Galin L. Jones,et al.  Honest Exploration of Intractable Probability Distributions via Markov Chain Monte Carlo , 2001 .

[27]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[28]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[29]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[30]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[31]  Yves F. Atchad'e Markov chain monte carlo: Confidence intervals , 2016 .

[32]  Manfred Liebmann,et al.  Anatomically accurate high resolution modeling of human whole heart electromechanics: A strongly scalable algebraic multigrid solver method for nonlinear deformation , 2016, J. Comput. Phys..

[33]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[34]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[35]  Jeffrey S. Rosenthal,et al.  Simple confidence intervals for MCMC without CLTs , 2017 .

[36]  Vivekananda Roy,et al.  Convergence Diagnostics for Markov Chain Monte Carlo , 2019, Annual Review of Statistics and Its Application.

[37]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[38]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[39]  A. J. Lotka Elements of Physical Biology. , 1925, Nature.

[40]  Qiang Liu,et al.  Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[41]  Wittawat Jitkrittum,et al.  Large sample analysis of the median heuristic , 2017, 1707.07269.

[42]  Pierre E. Jacob,et al.  Estimating Convergence of Markov chains with L-Lag Couplings , 2019, NeurIPS.

[43]  R. Tweedie,et al.  Bounds on regeneration times and convergence rates for Markov chains fn1 fn1 Work supported in part , 1999 .

[44]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[45]  Raaz Dwivedi,et al.  The power of online thinning in reducing discrepancy , 2016, Probability Theory and Related Fields.

[46]  Qiang Liu,et al.  Black-box Importance Sampling , 2016, AISTATS.

[47]  B. Goodwin Oscillatory behavior in enzymatic control processes. , 1965, Advances in enzyme regulation.

[48]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[49]  Mark Girolami,et al.  The Controlled Thermodynamic Integral for Bayesian Model Evidence Evaluation , 2016 .

[50]  Mark Girolami,et al.  Semi-Exact Control Functionals From Sard’s Method , 2020, Biometrika.

[51]  Lester W. Mackey,et al.  Stein Points , 2018, ICML.

[52]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[53]  Carol S. Woodward,et al.  Enabling New Flexibility in the SUNDIALS Suite of Nonlinear and Differential/Algebraic Equation Solvers , 2020, ACM Trans. Math. Softw..

[54]  Takeru Matsuda,et al.  A Stein Goodness-of-fit Test for Directional Distributions , 2020, AISTATS.

[55]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[57]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[58]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[59]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[60]  Tirthankar Dasgupta,et al.  Sequential Exploration of Complex Surfaces Using Minimum Energy Designs , 2015, Technometrics.

[61]  Michael A Colman,et al.  Arrhythmia mechanisms and spontaneous calcium release: Bi-directional coupling between re-entrant and focal excitation , 2019, PLoS Comput. Biol..

[62]  Dootika Vats,et al.  Revisiting the Gelman–Rubin Diagnostic , 2018, Statistical Science.

[63]  James M. Flegal,et al.  Multivariate output analysis for Markov chain Monte Carlo , 2015, Biometrika.

[64]  Mark A. Girolami,et al.  Estimating Bayes factors via thermodynamic integration and population MCMC , 2009, Comput. Stat. Data Anal..

[65]  Murali Haran,et al.  Markov chain Monte Carlo: Can we trust the third significant figure? , 2007, math/0703746.

[66]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[67]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[68]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[69]  J. Rosenthal Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo , 1995 .

[70]  Huiling Le,et al.  A diffusion approach to Stein's method on Riemannian manifolds , 2020, 2003.11497.

[71]  Chang Liu,et al.  Riemannian Stein Variational Gradient Descent for Bayesian Inference , 2017, AAAI.

[72]  Giacomo Zanella,et al.  The Barker proposal: combining robustness and efficiency in gradient-based MCMC , 2019 .

[73]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[74]  James M. Flegal,et al.  Analyzing MCMC Output , 2019 .

[75]  Christian P. Robert,et al.  Bayesian computation: a summary of the current state, and samples backwards and forwards , 2015, Statistics and Computing.

[76]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[77]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[78]  A. Tanskanen,et al.  A simplified local control model of calcium-induced calcium release in cardiac ventricular myocytes. , 2004, Biophysical journal.

[79]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .