Deep Markov Chain Monte Carlo

Author(s): Shahbaba, Babak; Lomeli, Luis Martinez; Chen, Tian; Lan, Shiwei | Abstract: We propose a new computationally efficient sampling scheme for Bayesian inference involving high dimensional probability distributions. Our method maps the original parameter space into a low-dimensional latent space, explores the latent space to generate samples, and maps these samples back to the original space for inference. While our method can be used in conjunction with any dimension reduction technique to obtain the latent space, and any standard sampling algorithm to explore the low-dimensional space, here we specifically use a combination of auto-encoders (for dimensionality reduction) and Hamiltonian Monte Carlo (HMC, for sampling). To this end, we first run an HMC to generate some initial samples from the original parameter space, and then use these samples to train an auto-encoder. Next, starting with an initial state, we use the encoding part of the autoencoder to map the initial state to a point in the low-dimensional latent space. Using another HMC, this point is then treated as an initial state in the latent space to generate a new state, which is then mapped to the original space using the decoding part of the auto-encoder. The resulting point can be treated as a Metropolis-Hasting (MH) proposal, which is either accepted or rejected. While the induced dynamics in the parameter space is no longer Hamiltonian, it remains time reversible, and the Markov chain could still converge to the canonical distribution using a volume correction term. Dropping the volume correction step results in convergence to an approximate but reasonably accurate distribution. The empirical results based on several high-dimensional problems show that our method could substantially reduce the computational cost of Bayesian inference.

[1]  A. Kennedy,et al.  Hybrid Monte Carlo , 1988 .

[2]  Babak Shahbaba,et al.  Split Hamiltonian Monte Carlo , 2011, Stat. Comput..

[3]  Babak Shahbaba,et al.  Neural network gradient Hamiltonian Monte Carlo , 2017, Computational Statistics.

[4]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[5]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[6]  Andrew M. Stuart,et al.  Geometric MCMC for infinite-dimensional inverse problems , 2016, J. Comput. Phys..

[7]  Hongkai Zhao,et al.  Variational Hamiltonian Monte Carlo via Score Matching. , 2016, Bayesian analysis.

[8]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[9]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[10]  Jascha Sohl-Dickstein,et al.  Generalizing Hamiltonian Monte Carlo with Neural Networks , 2017, ICLR.

[11]  Tim Salimans,et al.  Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[12]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[13]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[14]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[15]  Babak Shahbaba,et al.  Spherical Hamiltonian Monte Carlo for Constrained Target Distributions , 2013, ICML.

[16]  Babak Shahbaba,et al.  Wormhole Hamiltonian Monte Carlo , 2014, AAAI.

[17]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[18]  Stefano Ermon,et al.  A-NICE-MC: Adversarial Training for MCMC , 2017, NIPS.

[19]  Shiwei Lan,et al.  Adaptive dimension reduction to accelerate infinite-dimensional geometric Markov Chain Monte Carlo , 2018, J. Comput. Phys..

[20]  Michael Betancourt,et al.  The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling , 2015, ICML.

[21]  Babak Shahbaba,et al.  Hamiltonian Monte Carlo acceleration using surrogate functions with random bases , 2015, Statistics and Computing.

[22]  Babak Shahbaba,et al.  Precomputing strategy for Hamiltonian Monte Carlo method based on regularity in parameter space , 2015, Computational Statistics.

[23]  Arthur Gretton,et al.  Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families , 2015, NIPS.

[24]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[25]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[26]  J. M. Sanz-Serna,et al.  Optimal tuning of the hybrid Monte Carlo algorithm , 2010, 1001.4460.

[27]  Tiangang Cui,et al.  Dimension-independent likelihood-informed MCMC , 2014, J. Comput. Phys..

[28]  Nando de Freitas,et al.  Variational MCMC , 2001, UAI.

[29]  Juha Karhunen,et al.  Approximate Riemannian Conjugate Gradient Learning for Fixed-Form Variational Bayes , 2010, J. Mach. Learn. Res..

[30]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[31]  Michael Betancourt,et al.  A Conceptual Introduction to Hamiltonian Monte Carlo , 2017, 1701.02434.