Analysis of Langevin Monte Carlo via Convex Optimization

In this paper, we provide new insights on the Unadjusted Langevin Algorithm. We show that this method can be formulated as a first order optimization algorithm of an objective functional defined on the Wasserstein space of order $2$. Using this interpretation and techniques borrowed from convex optimization, we give a non-asymptotic analysis of this method to sample from logconcave smooth target distribution on $\mathbb{R}^d$. Our proofs are then easily extended to the Stochastic Gradient Langevin Dynamics, which is a popular extension of the Unadjusted Langevin Algorithm. Finally, this interpretation leads to a new methodology to sample from a non-smooth target distribution, for which a similar study is done.

[1]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[2]  W. Krauth Statistical Mechanics: Algorithms and Computations , 2006 .

[3]  F. Santambrogio {Euclidean, metric, and Wasserstein} gradient flows: an overview , 2016, 1609.03890.

[4]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[5]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[6]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[7]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .

[8]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[9]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[10]  S. Ethier,et al.  Markov Processes: Characterization and Convergence , 2005 .

[11]  Liyao Wang Heat Capacity Bound, Energy Fluctuations and Convexity , 2014 .

[12]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[13]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[14]  Yee Whye Teh,et al.  Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[15]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[16]  C. Villani Optimal Transport: Old and New , 2008 .

[17]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[18]  Andre Wibisono,et al.  Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem , 2018, COLT.

[19]  Ole A. Nielsen An Introduction to Integration and Measure Theory , 1997 .

[20]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[21]  Andrew M. Stuart,et al.  Inverse problems: A Bayesian perspective , 2010, Acta Numerica.

[22]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[23]  Qing Li,et al.  The Bayesian elastic net , 2010 .

[24]  Silouanos Brazitikos Geometry of Isotropic Convex Bodies , 2014 .

[25]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[26]  R. Tyrrell Rockafellar,et al.  Variational Analysis , 1998, Grundlehren der mathematischen Wissenschaften.

[27]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[28]  Sergey G. Bobkov,et al.  The Entropy Per Coordinate of a Random Vector is Highly Constrained Under Convexity Conditions , 2010, IEEE Transactions on Information Theory.

[29]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[30]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[31]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[32]  Van Hoang Nguyen Inégalités fonctionnelles et convexité , 2013 .

[33]  G. Parisi Correlation functions and computer simulations (II) , 1981 .

[34]  Liyao Wang,et al.  Optimal Concentration of Information Content For Log-Concave Densities , 2015, ArXiv.

[35]  J. Rosenthal,et al.  Optimal scaling of discrete approximations to Langevin diffusions , 1998 .

[36]  Yang Jing L1 Regularization Path Algorithm for Generalized Linear Models , 2008 .

[37]  Ioannis Karatzas,et al.  Brownian Motion and Stochastic Calculus , 1987 .

[38]  Jonathan C. Mattingly,et al.  Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise , 2002 .

[39]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[40]  Patrice Marcotte,et al.  New classes of generalized monotonicity , 1995 .

[41]  Jinghui Chen,et al.  Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[42]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[43]  B. Martinet Brève communication. Régularisation d'inéquations variationnelles par approximations successives , 1970 .

[44]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[45]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[46]  Nicholas G. Polson,et al.  Simulation-based Regularized Logistic Regression , 2010, 1005.3430.

[47]  Peter L. Bartlett,et al.  Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[48]  Darko Žubrinić,et al.  Fundamentals of Applied Functional Analysis: Distributions, Sobolev Spaces, Nonlinear Elliptic Equations , 1997 .

[49]  P. Donnelly MARKOV PROCESSES Characterization and Convergence (Wiley Series in Probability and Mathematical Statistics) , 1987 .

[50]  L. Ambrosio,et al.  Existence and stability for Fokker–Planck equations with log-concave reference measure , 2007, Probability Theory and Related Fields.

[51]  Stochastic Relaxation , 2014, Computer Vision, A Reference Guide.

[52]  Marcelo Pereyra,et al.  Proximal Markov chain Monte Carlo algorithms , 2013, Statistics and Computing.

[53]  Arnaud Guillin,et al.  Convergence to equilibrium in Wasserstein distance for Fokker-Planck equations , 2011 .

[54]  J. D. Doll,et al.  Brownian dynamics as smart Monte Carlo simulation , 1978 .

[55]  G. Pagès,et al.  RECURSIVE COMPUTATION OF THE INVARIANT DISTRIBUTION OF A DIFFUSION: THE CASE OF A WEAKLY MEAN REVERTING DRIFT , 2003 .

[56]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[57]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[58]  D. Kinderlehrer,et al.  THE VARIATIONAL FORMULATION OF THE FOKKER-PLANCK EQUATION , 1996 .

[59]  D. Talay,et al.  Expansion of the global error for numerical schemes solving stochastic differential equations , 1990 .