Auxiliary gradient‐based sampling algorithms

We introduce a new family of MCMC samplers that combine auxiliary variables, Gibbs sampling and Taylor expansions of the target density. Our approach permits the marginalisation over the auxiliary variables yielding marginal samplers, or the augmentation of the auxiliary variables, yielding auxiliary samplers. The well-known Metropolis-adjusted Langevin algorithm (MALA) and preconditioned Crank-Nicolson Langevin (pCNL) algorithm are shown to be special cases. We prove that marginal samplers are superior in terms of asymptotic variance and demonstrate cases where they are slower in computing time compared to auxiliary samplers. In the context of latent Gaussian models we propose new auxiliary and marginal samplers whose implementation requires a single tuning parameter, which can be found automatically during the transient phase. Extensive experimentation shows that the increase in efficiency (measured as effective sample size per unit of computing time) relative to (optimised implementations of) pCNL, elliptical slice sampling and MALA ranges from 10-fold in binary classification problems to 25-fold in log-Gaussian Cox processes to 100-fold in Gaussian process regression, and it is on par with Riemann manifold Hamiltonian Monte Carlo in an example where the latter has the same complexity as the aforementioned algorithms. We explain this remarkable improvement in terms of the way alternative samplers try to approximate the eigenvalues of the target. We introduce a novel MCMC sampling scheme for hyperparameter learning that builds upon the auxiliary samplers. The MATLAB code for reproducing the experiments in the article is publicly available and a Supplement to this article contains additional experiments and implementation details.

[1]  C. Andrieu,et al.  Convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms , 2012, 1210.1484.

[2]  L. Tierney A note on Metropolis-Hastings kernels for general state spaces , 1998 .

[3]  H. Rue,et al.  On Block Updating in Markov Random Field Models for Disease Mapping , 2002 .

[4]  G. Roberts,et al.  MCMC methods for diffusion bridges , 2008 .

[5]  A. P. Dawid,et al.  Regression and Classification Using Gaussian Process Priors , 2009 .

[6]  M. J. Bayarri,et al.  Non-Centered Parameterisations for Hierarchical Models and Data Augmentation , 2003 .

[7]  Yichuan Zhang,et al.  Quasi-Newton Methods for Markov Chain Monte Carlo , 2011, NIPS.

[8]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[9]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[10]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[11]  Tiangang Cui,et al.  Dimension-independent likelihood-informed MCMC , 2014, J. Comput. Phys..

[12]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[13]  Maurizio Filippone,et al.  A comparative evaluation of stochastic-based inference methods for Gaussian process models , 2013, Machine Learning.

[14]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[15]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[16]  James Hensman,et al.  MCMC for Variationally Sparse Gaussian Processes , 2015, NIPS.

[17]  G. Roberts,et al.  Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler , 1997 .

[18]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[19]  Ryan P. Adams,et al.  Slice sampling covariance hyperparameters of latent Gaussian models , 2010, NIPS.

[20]  Mark R. Bass,et al.  A comparison of centring parameterisations of Gaussian process-based models for Bayesian computation using MCMC , 2017, Stat. Comput..

[21]  Ryan P. Adams,et al.  Elliptical slice sampling , 2009, AISTATS.

[22]  P. Peskun,et al.  Optimum Monte-Carlo sampling using Markov chains , 1973 .

[23]  G. Storvik On the Flexibility of Metropolis–Hastings Acceptance Probabilities in Auxiliary Variable Proposal Generation , 2011 .

[24]  Antonietta Mira,et al.  AN EXTENSION OF PESKUN AND TIERNEY ORDERINGS TO CONTINUOUS TIME MARKOV CHAINS , 2008 .

[25]  C. Fox,et al.  Coupled MCMC with a randomized acceptance probability , 2012, 1205.6857.

[26]  Kody J. H. Law Proposals which speed up function-space MCMC , 2014, J. Comput. Appl. Math..

[27]  G. Roberts,et al.  Optimal scaling of the random walk Metropolis on elliptically symmetric unimodal targets , 2009, 0909.0856.

[28]  G. Roberts,et al.  MCMC Methods for Functions: ModifyingOld Algorithms to Make Them Faster , 2012, 1202.0709.