Global Non-convex Optimization with Discretized Diffusions

An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems. We show that this property holds for any suitably smooth diffusion and that different diffusions are suitable for optimizing different classes of convex and non-convex functions. This allows us to design diffusions suitable for globally optimizing convex and non-convex functions not covered by the existing Langevin theory. Our non-asymptotic analysis delivers computable optimization and integration error bounds based on easily accessed properties of the objective and chosen diffusion. Central to our approach are new explicit Stein factor bounds on the solutions of Poisson equations. We complement these results with improved optimization guarantees for targets other than the standard Gibbs measure.

[1]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[2]  R. Khasminskii Stochastic Stability of Differential Equations , 1980 .

[3]  B. Øksendal Stochastic Differential Equations , 1985 .

[4]  S. Mitter,et al.  Recursive stochastic algorithms for global optimization in R d , 1991 .

[5]  A. M. Mathai,et al.  Quadratic forms in random variables : theory and applications , 1992 .

[6]  K. Elworthy,et al.  Formulae for the Derivatives of Heat Semigroups , 1994, 1911.10971.

[7]  S. Cerrai Second Order Pde's in Finite and Infinite Dimension: A Probabilistic Approach , 2001 .

[8]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[9]  A. Veretennikov,et al.  On the poisson equation and diffusion approximation 3 , 2001, math/0506596.

[10]  Jonathan C. Mattingly,et al.  Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise , 2002 .

[11]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[12]  Arnak S. Dalalyan,et al.  Sparse Regression Learning by Aggregation and Langevin Monte-Carlo , 2009, COLT.

[13]  Andrew M. Stuart,et al.  Convergence of Numerical Time-Averaging and Stationary Measures via Poisson Equations , 2009, SIAM J. Numer. Anal..

[14]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[15]  R. Khasminskii Stability of Stochastic Differential Equations , 2012 .

[16]  G. J. O. Jameson,et al.  Inequalities for Gamma Function Ratios , 2013, Am. Math. Mon..

[17]  P. Cattiaux,et al.  Semi Log-Concave Markov Diffusions , 2013, 1303.6884.

[18]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[19]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[20]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[21]  Lawrence Carin,et al.  On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[22]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[23]  Feng-Yu Wang Exponential Contraction in Wasserstein Distances for Diffusion Semigroups with Negative Curvature , 2016, Potential Analysis.

[24]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[25]  A. Eberle Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[26]  Yee Whye Teh,et al.  Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[27]  Arnak S. Dalalyan,et al.  Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.

[28]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[29]  Jinghui Chen,et al.  Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[30]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[31]  Mert Gürbüzbalaban,et al.  Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and Momentum-Based Acceleration , 2018, Oper. Res..

[32]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.