Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling

Monte Carlo sampling for Bayesian posterior inference is a common approach used in machine learning. The Markov chain Monte Carlo procedures that are used are often discrete-time analogues of associated stochastic differential equations (SDEs). These SDEs are guaranteed to leave invariant the required posterior distribution. An area of current research addresses the computational benefits of stochastic gradient methods in this setting. Existing techniques rely on estimating the variance or covariance of the subsampling error, and typically assume constant variance. In this article, we propose a covariance-controlled adaptive Langevin thermostat that can effectively dissipate parameter-dependent noise while maintaining a desired target distribution. The proposed method achieves a substantial speedup over popular alternative schemes for large-scale machine learning applications.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  S. Nosé A unified formulation of the constant temperature molecular dynamics methods , 1984 .

[3]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[4]  A. Horowitz A generalized guided Monte Carlo algorithm , 1991 .

[5]  W. G. Hoover Computational Statistical Mechanics , 1991 .

[6]  M. Peters Computational Statistical Mechanics. Studies in Modern Thermodynamics 11: By WM. G. HOOVER. Elsevier, New York (1991). ISBN 0-444-88192-1; 313 pp. + contents. , 1992 .

[7]  Berend Smit,et al.  Understanding molecular simulation: from algorithms to applications , 1996 .

[8]  Berend Smit,et al.  Understanding Molecular Simulation , 2001 .

[9]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[10]  H. Robbins A Stochastic Approximation Method , 1951 .

[11]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[12]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[13]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[14]  B. Leimkuhler,et al.  Adaptive stochastic methods for sampling driven molecular systems. , 2011, The Journal of chemical physics.

[15]  B. Leimkuhler,et al.  Rational Construction of Stochastic Numerical Methods for Molecular Sampling , 2012, 1203.5428.

[16]  Ahn,et al.  Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[17]  Ryan Babbush,et al.  Bayesian Sampling Using Stochastic Gradient Thermostats , 2014, NIPS.

[18]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[19]  B. Leimkuhler,et al.  Molecular Dynamics: With Deterministic and Stochastic Numerical Methods , 2015 .

[20]  Assyr Abdulle,et al.  Long Time Accuracy of Lie-Trotter Splitting Methods for Langevin Dynamics , 2015, SIAM J. Numer. Anal..

[21]  K. Zygalakis,et al.  (Non-) asymptotic properties of Stochastic Gradient Langevin Dynamics , 2015, 1501.00438.

[22]  B. Leimkuhler,et al.  The computation of averages from equilibrium and nonequilibrium Langevin molecular dynamics , 2013, 1308.5814.

[23]  BENEDICT LEIMKUHLER,et al.  Adaptive Thermostats for Noisy Gradient Systems , 2015, SIAM J. Sci. Comput..