Langevin Monte Carlo without Smoothness

Langevin Monte Carlo (LMC) is an iterative algorithm used to generate samples from a distribution that is known only up to a normalizing constant. The nonasymptotic dependence of its mixing time on the dimension and target accuracy is understood mainly in the setting of smooth (gradient-Lipschitz) log-densities, a serious limitation for applications in machine learning. In this paper, we remove this limitation, providing polynomial-time convergence guarantees for a variant of LMC in the setting of nonsmooth log-concave distributions. At a high level, our results follow by leveraging the implicit smoothing of the log-density that comes from a small Gaussian perturbation that we add to the iterates of the algorithm and controlling the bias and variance that are induced by this perturbation.

[1]  G. Parisi Correlation functions and computer simulations (II) , 1981 .

[2]  Martin E. Dyer,et al.  A random polynomial-time algorithm for approximating the volume of convex bodies , 1991, JACM.

[3]  S. Shreve,et al.  Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.

[4]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[5]  C. Villani,et al.  Weighted Csiszár-Kullback-Pinsker inequalities and applications to transportation inequalities , 2005 .

[6]  Santosh S. Vempala,et al.  Dispersion of Mass and the Complexity of Randomized Geometric Algorithms , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[7]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[8]  Faming Liang,et al.  Statistical and Computational Inverse Problems , 2006, Technometrics.

[9]  Michael Elad,et al.  Analysis versus synthesis in signal priors , 2006, 2006 14th European Signal Processing Conference.

[10]  Santosh S. Vempala,et al.  The geometry of logconcave functions and sampling algorithms , 2007, Random Struct. Algorithms.

[11]  S. Vempala Geometric Random Walks: a Survey , 2007 .

[12]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[13]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[14]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[15]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[16]  Martin J. Wainwright,et al.  Randomized Smoothing for Stochastic Optimization , 2011, SIAM J. Optim..

[17]  Ohad Shamir,et al.  Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[18]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[19]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2014, Math. Program..

[20]  Yurii Nesterov,et al.  Universal gradient methods for convex optimization problems , 2015, Math. Program..

[21]  Yves F. Atchad'e A Moreau-Yosida approximation scheme for a class of high-dimensional posterior distributions , 2015, 1505.07072.

[22]  Yihong Wu,et al.  Wasserstein Continuity of Entropy and Outer Bounds for Interference Channels , 2015, IEEE Transactions on Information Theory.

[23]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[24]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[25]  Oren Mangoubi,et al.  Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions , 2017, 1708.07114.

[26]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[27]  Nisheeth K. Vishnoi,et al.  Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo , 2018, NeurIPS.

[28]  Santosh S. Vempala,et al.  Algorithmic Theory of ODEs and Sampling from Well-conditioned Logconcave Densities , 2018, ArXiv.

[29]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[30]  Peter L. Bartlett,et al.  Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[31]  Jean-Luc Starck,et al.  Analysis vs Synthesis-based Regularization for Combined Compressed Sensing and Parallel MRI Reconstruction at 7 Tesla , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[32]  Yuanzhi Li,et al.  An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.

[33]  Yuan Li,et al.  GRAPH-BASED REGULARIZATION FOR REGRESSION PROBLEMS WITH HIGHLY-CORRELATED DESIGNS , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[34]  Volkan Cevher,et al.  Mirrored Langevin Dynamics , 2018, NeurIPS.

[35]  Jinghui Chen,et al.  Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[36]  Marcelo Pereyra,et al.  Uncertainty quantification for radio interferometric imaging: I. proximal MCMC methods , 2017, Monthly Notices of the Royal Astronomical Society.

[37]  Eric Moulines,et al.  Efficient Bayesian Computation by Proximal Markov Chain Monte Carlo: When Langevin Meets Moreau , 2016, SIAM J. Imaging Sci..

[38]  Martin J. Wainwright,et al.  Fast MCMC Sampling Algorithms on Polytopes , 2017, J. Mach. Learn. Res..

[39]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[40]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[41]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[42]  Prateek Jain,et al.  Making the Last Iterate of SGD Information Theoretically Optimal , 2019, COLT.

[43]  Preconditioned P-ULA for Joint Deconvolution-Segmentation of Ultrasound Images , 2019, 1903.08111.

[44]  T. Kitching,et al.  Sparse Bayesian mass mapping with uncertainties: local credible intervals , 2018, Monthly Notices of the Royal Astronomical Society.

[45]  Jean-Yves Tourneret,et al.  Preconditioned P-ULA for Joint Deconvolution-Segmentation of Ultrasound Images , 2019, IEEE Signal Processing Letters.

[46]  Saeed Ghadimi,et al.  Non-asymptotic Results for Langevin Monte Carlo: Coordinate-wise and Black-box Sampling , 2019, 1902.01373.

[47]  Alain Durmus,et al.  Analysis of Langevin Monte Carlo via Convex Optimization , 2018, J. Mach. Learn. Res..

[48]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[49]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.