Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling

Sampling from an unnormalized probability distribution is a fundamental problem in machine learning with applications including Bayesian modeling, latent factor inference, and energy-based model training. After decades of research, variations of MCMC remain the default approach to sampling despite slow convergence. Auxiliary neural models can learn to speed up MCMC, but the overhead for training the extra model can be prohibitive. We propose a fundamentally different approach to this problem via a new Hamiltonian dynamics with a non-Newtonian momentum. In contrast to MCMC approaches like Hamiltonian Monte Carlo, no stochastic step is required. Instead, the proposed deterministic dynamics in an extended state space exactly sample the target distribution, specified by an energy function, under an assumption of ergodicity. Alternatively, the dynamics can be interpreted as a normalizing flow that samples a specified energy model without training. The proposed Energy Sampling Hamiltonian (ESH) dynamics have a simple form that can be solved with existing ODE solvers, but we derive a specialized solver that exhibits much better performance. ESH dynamics converge faster than their MCMC competitors enabling faster, more stable training of neural network energy models.

[1]  J. Neumann Proof of the Quasi-Ergodic Hypothesis. , 1932, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[3]  Yang Lu,et al.  Cooperative Training of Descriptor and Generator Networks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[5]  Yang Lu,et al.  A Theory of Generative ConvNet , 2016, ICML.

[6]  Shuang Li,et al.  Improved Contrastive Divergence Training of Energy Based Models , 2020, ICML.

[7]  Andrew Gordon Wilson,et al.  What Are Bayesian Neural Network Posteriors Really Like? , 2021, ICML.

[8]  Michael Habeck,et al.  Model evidence from nonequilibrium simulations , 2017, NIPS.

[9]  Sylvain Le Corff,et al.  Invertible Flow Non Equilibrium sampling , 2021, 2103.10943.

[10]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[11]  Mark E. Tuckerman,et al.  Explicit reversible integrators for extended systems dynamics , 1996 .

[12]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[13]  J. Dormand,et al.  A family of embedded Runge-Kutta formulae , 1980 .

[14]  Eric Vanden-Eijnden,et al.  Dynamical Computation of the Density of States and Bayes Factors Using Nonequilibrium Importance Sampling. , 2019, Physical review letters.

[15]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[16]  P. Hänggi,et al.  Logarithmic oscillators: ideal Hamiltonian thermostats. , 2012, Physical review letters.

[17]  Diederik P. Kingma,et al.  How to Train Your Energy-Based Models , 2021, ArXiv.

[18]  Hao Wu,et al.  Stochastic Normalizing Flows , 2020, NeurIPS.

[19]  P. Patra,et al.  Temperature and its control in molecular dynamics simulations , 2020, 2006.02327.

[20]  H. Yoshida Construction of higher order symplectic integrators , 1990 .

[21]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[22]  L. Verlet Computer "Experiments" on Classical Fluids. I. Thermodynamical Properties of Lennard-Jones Molecules , 1967 .

[23]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[24]  M. Klein,et al.  Nosé-Hoover chains : the canonical ensemble via continuous dynamics , 1992 .

[25]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[26]  M. Cuendet The Jarzynski identity derived from general Hamiltonian or non-Hamiltonian dynamics reproducing NVT or NPT ensembles. , 2006, The Journal of chemical physics.

[27]  Jascha Sohl-Dickstein,et al.  Generalizing Hamiltonian Monte Carlo with Neural Networks , 2017, ICLR.

[28]  Mohammad Norouzi,et al.  Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[29]  Zhijian Ou,et al.  Learning Neural Random Fields with Inclusive Auxiliary Generators , 2018, ArXiv.

[30]  Jack J. Dongarra,et al.  Guest Editors Introduction to the top 10 algorithms , 2000, Comput. Sci. Eng..

[31]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[32]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[33]  J. D. Doll,et al.  Brownian dynamics as smart Monte Carlo simulation , 1978 .

[34]  Arnaud Doucet,et al.  Hamiltonian Variational Auto-Encoder , 2018, NeurIPS.

[35]  C. Jarzynski Equalities and Inequalities: Irreversibility and the Second Law of Thermodynamics at the Nanoscale , 2011 .

[36]  Max Welling,et al.  Deterministic Gibbs Sampling via Ordinary Differential Equations , 2021, ArXiv.

[37]  Erik Nijkamp,et al.  Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model , 2019, NeurIPS.

[38]  Zanette,et al.  Thermodynamics of anomalous diffusion. , 1995, Physical review letters.

[39]  Joshua V. Dillon,et al.  NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport , 2019, 1903.03704.

[40]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[41]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[42]  Jascha Sohl-Dickstein,et al.  Hamiltonian Monte Carlo Without Detailed Balance , 2014, ICML.

[43]  Hoover,et al.  Canonical dynamics: Equilibrium phase-space distributions. , 1985, Physical review. A, General physics.

[44]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[45]  C. Jarzynski Nonequilibrium Equality for Free Energy Differences , 1996, cond-mat/9610209.

[46]  Calvin C. Moore,et al.  Ergodic theorem, ergodic theory, and statistical mechanics , 2015, Proceedings of the National Academy of Sciences.

[47]  C. G. Hoover,et al.  Deterministic time-reversible thermostats: chaos, ergodicity, and the zeroth law of thermodynamics , 2015, 1501.03875.

[48]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[49]  Mohammad Norouzi,et al.  No MCMC for me: Amortized sampling for fast and stable training of energy-based models , 2021, ICLR.

[50]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[51]  Yoshua Bengio,et al.  Maximum Entropy Generators for Energy-Based Models , 2019, ArXiv.

[52]  Zengyi Li,et al.  A Neural Network MCMC Sampler That Maximizes Proposal Entropy , 2020, Entropy.

[53]  Zeroth Law investigation on the logarithmic thermostat , 2018, Scientific Reports.

[54]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[55]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[56]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[57]  S. Nosé A molecular dynamics method for simulations in the canonical ensemble , 1984 .

[58]  Frank D. Wood,et al.  All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference , 2020, ICML.

[59]  Dmitry Vetrov,et al.  Involutive MCMC: a Unifying Framework , 2020, ICML.

[60]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[61]  Carl P. Dettmann,et al.  Thermostats: Analysis and application. , 1998, Chaos.

[62]  Orbital MCMC , 2020, ArXiv.

[63]  G. Forbes Molecular Dynamics , 1885, Nature.

[64]  Tian Han,et al.  On the Anatomy of MCMC-based Maximum Likelihood Learning of Energy-Based Models , 2019, AAAI.

[65]  M. Mirzakhani,et al.  Introduction to Ergodic theory , 2010 .

[66]  Paul F. Tupper,et al.  Ergodicity and the Numerical Simulation of Hamiltonian Systems , 2005, SIAM J. Appl. Dyn. Syst..

[67]  Dana S Kleinerman,et al.  Implementations of Nosé-Hoover and Nosé-Poincaré thermostats in mesoscopic dynamic simulations with the united-residue model of a polypeptide chain. , 2008, The Journal of chemical physics.

[68]  G. Birkhoff Proof of the Ergodic Theorem , 1931, Proceedings of the National Academy of Sciences.