Accelerating Langevin Sampling with Birth-death

A fundamental problem in Bayesian inference and statistical machine learning is to efficiently sample from multimodal distributions. Due to metastability, multimodal distributions are difficult to sample using standard Markov chain Monte Carlo methods. We propose a new sampling algorithm based on a birth-death mechanism to accelerate the mixing of Langevin diffusion. Our algorithm is motivated by its mean field partial differential equation (PDE), which is a Fokker-Planck equation supplemented by a nonlocal birth-death term. This PDE can be viewed as a gradient flow of the Kullback-Leibler divergence with respect to the Wasserstein-Fisher-Rao metric. We prove that under some assumptions the asymptotic convergence rate of the nonlocal PDE is independent of the potential barrier, in contrast to the exponential dependence in the case of the Langevin diffusion. We illustrate the efficiency of the birth-death accelerated Langevin method through several analytical examples and numerical experiments.

[1]  M. Freidlin,et al.  Random Perturbations of Dynamical Systems , 1984 .

[2]  D. Landau,et al.  Efficient, multiple-range random walk algorithm to calculate the density of states. , 2000, Physical review letters.

[3]  Oren Mangoubi,et al.  Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions , 2017, 1708.07114.

[4]  P. Kloeden,et al.  Strong convergence of an explicit numerical method for SDEs with nonglobally Lipschitz continuous coefficients , 2010, 1010.3756.

[5]  Yann Brenier,et al.  On Optimal Transport of Matrix-Valued Measures , 2018, SIAM J. Math. Anal..

[6]  Joan Bruna,et al.  Global convergence of neuron birth-death dynamics , 2019, ICML 2019.

[7]  J. D. Doll,et al.  Brownian dynamics as smart Monte Carlo simulation , 1978 .

[8]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[9]  P. Fearnhead,et al.  The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data , 2016, The Annals of Statistics.

[10]  Alexander Mielke,et al.  Optimal Transport in Competition with Reaction: The Hellinger-Kantorovich Distance and Geodesic Curves , 2015, SIAM J. Math. Anal..

[11]  É. Moulines,et al.  The tamed unadjusted Langevin algorithm , 2017, Stochastic Processes and their Applications.

[12]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[13]  A. Laio,et al.  Escaping free-energy minima , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  A. Bovier,et al.  Metastability in Reversible Diffusion Processes I: Sharp Asymptotics for Capacities and Exit Times , 2004 .

[15]  Nicolas Chopin,et al.  Free energy methods for Bayesian inference: efficient exploration of univariate Gaussian mixture posteriors , 2010, Statistics and Computing.

[16]  Wang,et al.  Replica Monte Carlo simulation of spin glasses. , 1986, Physical review letters.

[17]  David D L Minh,et al.  Nonequilibrium candidate Monte Carlo is an efficient tool for equilibrium simulation , 2011, Proceedings of the National Academy of Sciences.

[18]  Donald A. Dawson,et al.  Measure-valued Markov processes , 1993 .

[19]  Giuseppe Savaré,et al.  Optimal Entropy-Transport problems and a new Hellinger–Kantorovich distance between positive measures , 2015, 1508.07941.

[20]  F. Otto THE GEOMETRY OF DISSIPATIVE EVOLUTION EQUATIONS: THE POROUS MEDIUM EQUATION , 2001 .

[21]  Qiang Liu,et al.  Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[22]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[23]  Martin Bauer,et al.  Uniqueness of the Fisher-Rao metric on the space of smooth densities , 2014, 1411.5577.

[24]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[25]  Benedict J. Leimkuhler,et al.  Ensemble preconditioning for Markov chain Monte Carlo simulation , 2016, Statistics and Computing.

[26]  François-Xavier Vialard,et al.  An Interpolating Distance Between Optimal Transport and Fisher–Rao Metrics , 2010, Foundations of Computational Mathematics.

[27]  G. Menz,et al.  Poincaré and logarithmic Sobolev inequalities by decomposition of the energy landscape , 2012, 1202.1510.

[28]  G. Parisi,et al.  Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[29]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[30]  J. Norris Long-Time Behaviour of Heat Flow: Global Estimates and Exact Asymptotics , 1997 .

[31]  A. Bovier,et al.  Metastability in reversible diffusion processes II. Precise asymptotics for small eigenvalues , 2005 .

[32]  Jianfeng Lu,et al.  Scaling Limit of the Stein Variational Gradient Descent: The Mean Field Regime , 2018, SIAM J. Math. Anal..

[33]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[34]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[35]  H. C. Andersen Molecular dynamics simulations at constant pressure and/or temperature , 1980 .

[36]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[37]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[38]  Dmitry Vorotnikov,et al.  A fitness-driven cross-diffusion system from polulation dynamics as a gradient flow , 2016, 1603.06431.

[39]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[40]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[41]  Thomas O. Gallouët,et al.  A JKO Splitting Scheme for Kantorovich-Fisher-Rao Gradient Flows , 2016, SIAM J. Math. Anal..

[42]  A. Doucet,et al.  The Bouncy Particle Sampler: A Nonreversible Rejection-Free Markov Chain Monte Carlo Method , 2015, 1510.02451.

[43]  Kremer,et al.  Molecular dynamics simulation for polymers in the presence of a heat bath. , 1986, Physical review. A, General physics.

[44]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[45]  D. Kinderlehrer,et al.  THE VARIATIONAL FORMULATION OF THE FOKKER-PLANCK EQUATION , 1996 .