Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond

Gaussian latent variable models are a key class of Bayesian hierarchical models with applications in many fields. Performing Bayesian inference on such models can be challenging as Markov chain Monte Carlo algorithms struggle with the geometry of the resulting posterior distribution and can be prohibitively slow. An alternative is to use a Laplace approximation to marginalize out the latent Gaussian variables and then integrate out the remaining hyperparameters using dynamic Hamiltonian Monte Carlo, a gradient-based Markov chain Monte Carlo sampler. To implement this scheme efficiently, we derive a novel adjoint method that propagates the minimal information needed to construct the gradient of the approximate marginal likelihood. This strategy yields a scalable method that is orders of magnitude faster than state of the art techniques when the hyperparameters are high dimensional. We prototype the method in the probabilistic programming framework Stan and test the utility of the embedded Laplace approximation on several models, including one where the dimension of the hyperparameter is $\sim$6,000. Depending on the cases, the benefits are either a dramatic speed-up, or an alleviation of the geometric pathologies that frustrate Hamiltonian Monte Carlo.

[1]  Virgilio Gómez-Rubio,et al.  Markov chain Monte Carlo with the Integrated Nested Laplace Approximation , 2017, Stat. Comput..

[2]  Dustin Tran,et al.  TensorFlow Distributions , 2017, ArXiv.

[3]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[4]  Bob Carpenter,et al.  The Stan Math Library: Reverse-Mode Automatic Differentiation in C++ , 2015, ArXiv.

[5]  Tamara Broderick,et al.  The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions , 2019, ICML.

[6]  M. Betancourt,et al.  Optimizing The Integrator Step Size for Hamiltonian Monte Carlo , 2014, 1411.6669.

[7]  Anders Nielsen,et al.  TMB: Automatic Differentiation and Laplace Approximation , 2015, 1509.00660.

[8]  Ben Calderhead,et al.  Riemannian Manifold Hamiltonian Monte Carlo , 2009, 0907.1100.

[9]  Ole Winther,et al.  Bayesian Leave-One-Out Cross-Validation Approximations for Gaussian Latent Variable Models , 2014, J. Mach. Learn. Res..

[10]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[11]  J. Vanhatalo,et al.  Approximate inference for disease mapping with sparse Gaussian processes , 2010, Statistics in medicine.

[12]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[13]  Arthur Stanley,et al.  Yes , 1923, The Hospital and health review.

[14]  Trevor Campbell,et al.  Validated Variational Inference via Practical Posterior Error Bounds , 2019, AISTATS.

[15]  Aki Vehtari,et al.  GPstuff: Bayesian modeling with Gaussian processes , 2013, J. Mach. Learn. Res..

[16]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[17]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[18]  Carl E. Rasmussen,et al.  Assessing Approximate Inference for Binary Gaussian Process Classification , 2005, J. Mach. Learn. Res..

[19]  Michael Betancourt,et al.  A General Metric for Riemannian Manifold Hamiltonian Monte Carlo , 2012, GSI.

[20]  Michael Betancourt,et al.  A Conceptual Introduction to Hamiltonian Monte Carlo , 2017, 1701.02434.

[21]  R. Errico What is an adjoint model , 1997 .

[22]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[23]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[24]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[25]  S. Martino Approximate Bayesian Inference for Latent Gaussian Models , 2007 .

[26]  Kasper Kristensen,et al.  No-U-turn sampling for fast Bayesian inference in ADMB and TMB: Introducing the adnuts and tmbstan R packages , 2018, PloS one.

[27]  M. Betancourt,et al.  The Geometric Foundations of Hamiltonian Monte Carlo , 2014, 1410.5110.

[28]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[29]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[30]  Aki Vehtari,et al.  Sparsity information and regularization in the horseshoe and other shrinkage priors , 2017, 1707.01694.

[31]  Yichuan Zhang,et al.  Semi-Separable Hamiltonian Monte Carlo for Inference in Bayesian Hierarchical Models , 2014, NIPS.

[32]  Aki Vehtari,et al.  Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (with Discussion) , 2019, Bayesian Analysis.

[33]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[34]  Tom Heskes,et al.  Approximate Marginals in Latent Gaussian Models , 2011, J. Mach. Learn. Res..

[35]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[36]  James V. Burke,et al.  Algorithmic Differentiation of Implicit Functions and Optimal Values , 2008 .

[37]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[38]  A. Gelman,et al.  Pareto Smoothed Importance Sampling , 2015, 1507.02646.

[39]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[40]  R. Reynolds,et al.  Bulletin of the American Meteorological Society , 1996 .

[41]  Haavard Rue,et al.  Bayesian Computing with INLA: A Review , 2016, 1604.00860.

[42]  Michael Betancourt,et al.  A Geometric Theory of Higher-Order Automatic Differentiation , 2018, 1812.11592.

[43]  M. Betancourt,et al.  Hamiltonian Monte Carlo for Hierarchical Models , 2013, 1312.0906.

[44]  Aki Vehtari,et al.  Validating Bayesian Inference Algorithms with Simulation-Based Calibration , 2018, 1804.06788.

[45]  Aki Vehtari,et al.  Yes, but Did It Work?: Evaluating Variational Inference , 2018, ICML.

[46]  Charles C. Margossian,et al.  A review of automatic differentiation and its efficient implementation , 2018, WIREs Data Mining Knowl. Discov..