A Gradient Based Strategy for Hamiltonian Monte Carlo Hyperparameter Optimization

Hamiltonian Monte Carlo (HMC) is one of the most successful sampling methods in machine learning. However, its performance is signifcantly affected by the choice of hyperparameter values. Existing approaches for optimizing the HMC hyperparameters either optimize a proxy for mixing speed or consider the HMC chain as an implicit variational distribution and optimize a tractable lower bound that can be very loose in practice. Instead, we propose to optimize an objective that quantifes directly the speed of convergence to the target distribution. Our objective can be easily optimized using stochastic gradient descent. We evaluate our proposed method and compare to baselines on a variety of problems including sampling from synthetic 2D distributions, reconstructing sparse signals, learning deep latent variable models and sampling molecular confgurations from the Boltzmann distribution of a 22 atom molecule. We fnd that our method is competitive with or improves upon alternative baselines in all these experiments.

[1]  A. Gelman,et al.  Adaptively Scaling the Metropolis Algorithm Using Expected Squared Jumped Distance , 2007 .

[2]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[3]  H. Robbins A Stochastic Approximation Method , 1951 .

[4]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[5]  Jascha Sohl-Dickstein,et al.  Hamiltonian Annealed Importance Sampling for partition function estimation , 2012, ArXiv.

[6]  Patrick van der Smagt,et al.  Variational Inference with Hamiltonian Monte Carlo , 2016, 1609.08203.

[7]  Daniel Hernández-Lobato,et al.  Expectation propagation in linear regression models with spike-and-slab priors , 2015, Machine Learning.

[8]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[9]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[10]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[11]  Daniel Hernández-Lobato,et al.  Black-Box Alpha Divergence Minimization , 2015, ICML.

[12]  Matthew W. Hoffman,et al.  Black-Box Variational Inference as a Parametric Approximation to Langevin Dynamics , 2020, ICML.

[13]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[14]  David Duvenaud,et al.  Sticking the Landing: An Asymptotically Zero-Variance Gradient Estimator for Variational Inference , 2017, ArXiv.

[15]  George Tucker,et al.  Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives , 2019, ICLR.

[16]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[17]  E. Tabak,et al.  DENSITY ESTIMATION BY DUAL ASCENT OF THE LOG-LIKELIHOOD ∗ , 2010 .

[18]  Jos'e Miguel Hern'andez-Lobato,et al.  Sliced Kernelized Stein Discrepancy , 2020, ICLR.

[19]  Hao Wu,et al.  Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning , 2018, Science.

[20]  Arnaud Doucet,et al.  Hamiltonian Variational Auto-Encoder , 2018, NeurIPS.

[21]  E. Hairer,et al.  Geometric numerical integration illustrated by the Störmer–Verlet method , 2003, Acta Numerica.

[22]  Francisco J. R. Ruiz,et al.  A Contrastive Divergence for Combining Variational Inference and MCMC , 2019, ICML.

[23]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[24]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[25]  Petros Dellaportas,et al.  Gradient-based Adaptive Markov Chain Monte Carlo , 2019, NeurIPS.

[26]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[27]  Qiang Cui,et al.  Faculty Opinions recommendation of Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. , 2019, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[28]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[29]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[30]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[31]  Joshua V. Dillon,et al.  NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport , 2019, 1903.03704.

[32]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[33]  Matthew D. Hoffman,et al.  Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo , 2017, ICML.

[34]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[35]  E. Hairer,et al.  Simulating Hamiltonian dynamics , 2006, Math. Comput..

[36]  Lawrence Carin,et al.  Bayesian Compressive Sensing , 2008, IEEE Transactions on Signal Processing.

[37]  Jascha Sohl-Dickstein,et al.  Generalizing Hamiltonian Monte Carlo with Neural Networks , 2017, ICLR.