Dimensionally Tight Running Time Bounds for Second-Order Hamiltonian Monte Carlo

Hamiltonian Monte Carlo (HMC) is a widely deployed method to sample from a given high-dimensional distribution in Statistics and Machine learning. HMC is known to run very efficiently in practice and its second-order variant was conjectured to run in $d^{1/4}$ steps in 1988. Here we show that this conjecture is true when sampling from strongly log-concave target distributions that satisfy weak third-order regularity properties associated with the input data. This improves upon a recent result that shows that the number of steps of the second-order discretization of HMC grows like $d^{1/4}$ under the much stronger assumption that the distribution is separable and its first four Fr\'echet derivatives are bounded. Our result also compares favorably with the best available running time bounds for the class of strongly log-concave distributions, namely the current best bounds for both the overdamped and underdamped Langevin, and first-order HMC Algorithms, which all grow like $d^{{1}/{2}}$ with the dimension. Key to our result is a new regularity condition for the Hessian that may be of independent interest. The class of distributions that satisfy this condition are natural and include posterior distributions used in Bayesian logistic "ridge" regression.

[1]  F. T. Wright,et al.  A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables , 1971 .

[2]  A. Kennedy,et al.  Hybrid Monte Carlo , 1988 .

[3]  Creutz Global Monte Carlo algorithms for many-fermion systems. , 1988, Physical review. D, Particles and fields.

[4]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[5]  Radford M. Neal Bayesian training of backpropagation networks by the hybrid Monte-Carlo method , 1992 .

[6]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[7]  A. Gelman,et al.  Physiological Pharmacokinetic Analysis Using Population Modeling and Informative Prior Distributions , 1996 .

[8]  David Barber,et al.  Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo , 1996, NIPS.

[9]  A. Raftery Approximate Bayes factors and accounting for model uncertainty in generalised linear models , 1996 .

[10]  Radford M. Neal,et al.  Bayesian Learning for Neural Networks (Lecture Notes in Statistical Vol. 118) , 1997 .

[11]  A. Gelman,et al.  Weak convergence and optimal scaling of random walk Metropolis algorithms , 1997 .

[12]  Mike West,et al.  Bayesian Regression Analysis in the "Large p, Small n" Paradigm with Application in DNA Microarray S , 2000 .

[13]  V. Chernozhukov,et al.  An MCMC Approach to Classical Estimation , 2002, 2301.07782.

[14]  S. Vempala,et al.  Hit-and-Run is Fast and Fun ∗ , 2002 .

[15]  E. Hairer,et al.  Geometric numerical integration illustrated by the Störmer–Verlet method , 2003, Acta Numerica.

[16]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[17]  M. Kirkilionis,et al.  On comparison systems for ordinary differential equations , 2004 .

[18]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[19]  Dmitriy Fradkin,et al.  Bayesian Multinomial Logistic Regression for Author Identification , 2005, AIP Conference Proceedings.

[20]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[21]  Santosh S. Vempala,et al.  Fast Algorithms for Logconcave Functions: Sampling, Rounding, Integration and Optimization , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[22]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[23]  A. Gelman,et al.  An Analysis of the New York City Police Department's “Stop-and-Frisk” Policy in the Context of Claims of Racial Bias , 2007 .

[24]  Tetsuya Takaishi,et al.  Financial Time Series Analysis of SV Model by Hybrid Monte Carlo , 2008, ICIC.

[25]  Jonathan C. Mattingly,et al.  SPDE limits of the random walk Metropolis algorithm in high dimensions , 2009 .

[26]  J. M. Sanz-Serna,et al.  Optimal tuning of the hybrid Monte Carlo algorithm , 2010, 1001.4460.

[27]  Alexandre H. Thi'ery,et al.  Optimal Scaling and Diffusion Limits for the Langevin Algorithm in High Dimensions , 2011, 1103.0542.

[28]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[29]  M. Wand,et al.  Variational Bayesian Inference for Parametric and Nonparametric Regression With Missing Data , 2011 .

[30]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[31]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[32]  Valen E Johnson On Numerical Aspects of Bayesian Model Selection in High and Ultrahigh-dimensional Settings. , 2013, Bayesian analysis.

[33]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[34]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[35]  M. Betancourt,et al.  Optimizing The Integrator Step Size for Hamiltonian Monte Carlo , 2014, 1411.6669.

[36]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[37]  Susan P. Holmes,et al.  Positive Curvature and Hamiltonian Monte Carlo , 2014, NIPS.

[38]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[39]  James Ridgway,et al.  Leave Pima Indians alone: binary regression as a benchmark for Bayesian computation , 2015, 1506.08640.

[40]  Santosh S. Vempala,et al.  Eldan's Stochastic Localization and the KLS Hyperplane Conjecture: An Improved Lower Bound for Expansion , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[41]  Aleksander Madry,et al.  Matrix Scaling and Balancing via Box Constrained Newton's Method and Interior Point Methods , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[42]  Arnak S. Dalalyan,et al.  Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.

[43]  Michael Betancourt,et al.  A Conceptual Introduction to Hamiltonian Monte Carlo , 2017, 1701.02434.

[44]  É. Moulines,et al.  On the convergence of Hamiltonian Monte Carlo , 2017, 1705.00166.

[45]  Oren Mangoubi,et al.  Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions , 2017, 1708.07114.

[46]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[47]  Santosh S. Vempala,et al.  Convergence rate of Riemannian Hamiltonian Monte Carlo and faster polytope volume computation , 2017, STOC.

[48]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[49]  Peter L. Bartlett,et al.  Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[50]  Aaron Smith,et al.  Rapid Mixing of Geodesic Walks on Manifolds with Positive Curvature , 2016, The Annals of Applied Probability.

[51]  Santosh S. Vempala,et al.  Stochastic localization + Stieltjes barrier = tight bound for log-Sobolev , 2018, STOC.

[52]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[53]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[54]  Alain Durmus,et al.  Analysis of Langevin Monte Carlo via Convex Optimization , 2018, J. Mach. Learn. Res..

[55]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[56]  M. Betancourt,et al.  On the geometric ergodicity of Hamiltonian Monte Carlo , 2016, Bernoulli.

[57]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.