Adaptive Hamiltonian and Riemann manifold Monte Carlo samplers

In this paper we address the widely-experienced difficulty in tuning Monte Carlo sampler based on simulating Hamiltonian dynamics. We develop an algorithm that allows for the adaptation of Hamiltonian and Riemann manifold Hamiltonian Monte Carlo samplers using Bayesian optimization that allows for infinite adaptation of the parameters of these samplers. We show that the resulting samplers are ergodic, and that the use of our adaptive algorithms makes it easy to obtain more efficient samplers, in some cases precluding the need for more complex solutions. Hamiltonian-based Monte Carlo samplers are widely known to be an excellent choice of MCMC method, and we aim with this paper to remove a key obstacle towards the more widespread use of these samplers in practice.

[1]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[2]  R. Tweedie,et al.  Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms , 1996 .

[3]  Radford M. Neal,et al.  High Dimensional Classification with Bayesian Neural Networks and Dirichlet Diffusion Trees , 2006, Feature Extraction.

[4]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[5]  J. M. Sanz-Serna,et al.  Optimal tuning of the hybrid Monte Carlo algorithm , 2010, 1001.4460.

[6]  J. Rosenthal,et al.  Coupling and Ergodicity of Adaptive Markov Chain Monte Carlo Algorithms , 2007, Journal of Applied Probability.

[7]  G. Roberts,et al.  Langevin Diffusions and Metropolis-Hastings Algorithms , 2002 .

[8]  Katherine A. Heller,et al.  Bayesian Exponential Family PCA , 2008, NIPS.

[9]  Nando de Freitas,et al.  Adaptive MCMC with Bayesian Optimization , 2012, AISTATS.

[10]  H. Ishwaran Applications of Hybrid Monte Carlo to Bayesian Generalized Linear Models: Quasicomplete Separation and Neural Networks , 1999 .

[11]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[12]  J. Rosenthal,et al.  Scaling limits for the transient phase of local Metropolis–Hastings algorithms , 2005 .

[13]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[14]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[15]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[16]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[17]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[18]  J. Mockus,et al.  The Bayesian approach to global optimization , 1989 .

[19]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[20]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Nando de Freitas,et al.  Self-Avoiding Random Dynamics on Integer Complex Systems , 2011, TOMC.

[22]  Lingyu Chen,et al.  Exploring Hybrid Monte Carlo in Bayesian Computation , 2000 .

[23]  C. Robert,et al.  Controlled MCMC for Optimal Sampling , 2001 .

[24]  Matti Vihola,et al.  Grapham: Graphical models with adaptive random walk Metropolis algorithms , 2008, Comput. Stat. Data Anal..

[25]  N. Shephard,et al.  Stochastic Volatility: Likelihood Inference And Comparison With Arch Models , 1996 .

[26]  G. Fort,et al.  Limit theorems for some adaptive MCMC algorithms with subgeometric kernels , 2008, 0807.2952.

[27]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[28]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[29]  A. Gelman,et al.  Adaptively Scaling the Metropolis Algorithm Using Expected Squared Jumped Distance , 2007 .

[30]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[31]  Yaakov Engel,et al.  Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .