Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates

We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution. Our method is a generalization of the Langevin algorithm to potentials expressed as the sum of one stochastic smooth term and multiple stochastic nonsmooth terms. In each iteration, our splitting technique only requires access to a stochastic gradient of the smooth term and a stochastic proximal operator for each of the nonsmooth terms. We establish nonasymptotic sublinear and linear convergence rates under convexity and strong convexity of the smooth term, respectively, expressed in terms of the KL divergence and Wasserstein distance. We illustrate the efficiency of our sampling technique through numerical simulations on a Bayesian learning task.

[1]  H. Brezis Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert , 1973 .

[2]  J. L. Webb OPERATEURS MAXIMAUX MONOTONES ET SEMI‐GROUPES DE CONTRACTIONS DANS LES ESPACES DE HILBERT , 1974 .

[3]  Gregory B. Passty Ergodic convergence to a zero of the sum of monotone operators in Hilbert space , 1979 .

[4]  R. Rockafellar,et al.  On the interchange of subdifferentiation and conditional expectation for convex functionals , 1982 .

[5]  Samuel Kotz,et al.  The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance , 2001 .

[6]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[7]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.

[8]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[9]  H. Robbins A Stochastic Approximation Method , 1951 .

[10]  C. Villani Optimal Transport: Old and New , 2008 .

[11]  A. Chambolle,et al.  An introduction to Total Variation for Image Analysis , 2009 .

[12]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[13]  Massimo Fornasier,et al.  Theoretical Foundations and Numerical Methods for Sparse Recovery , 2010, Radon Series on Computational and Applied Mathematics.

[14]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[15]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[16]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[17]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[18]  Laurent Condat,et al.  A Primal–Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms , 2012, Journal of Optimization Theory and Applications.

[19]  Bang Công Vu,et al.  A splitting algorithm for dual monotone inclusions involving cocoercive operators , 2011, Advances in Computational Mathematics.

[20]  R. Tibshirani Adaptive piecewise polynomial estimation via trend filtering , 2013, 1304.2986.

[21]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[22]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[23]  José M. F. Moura,et al.  Signal inpainting on graphs via total variation minimization , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[25]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[26]  James G. Scott,et al.  A Fast and Flexible Algorithm for the Graph-Fused Lasso , 2015, 1505.06475.

[27]  E. Airoldi,et al.  Stable Robbins-Monro approximations through stochastic proximal updates , 2015 .

[28]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[29]  Damek Davis,et al.  A Three-Operator Splitting Scheme and its Optimization Applications , 2015, 1504.01032.

[30]  E. Airoldi,et al.  Implicit stochastic approximation , 2015 .

[31]  Xiaofang Xu,et al.  Bayesian Variable Selection and Estimation for Group Lasso , 2015, 1512.01013.

[32]  L. Rosasco,et al.  Stochastic inertial primal-dual algorithms , 2015, 1507.00852.

[33]  Pascal Bianchi,et al.  Dynamical Behavior of a Stochastic Forward–Backward Algorithm Using Random Monotone Operators , 2015, J. Optim. Theory Appl..

[34]  Volkan Cevher,et al.  Stochastic Three-Composite Convex Minimization , 2017, NIPS.

[35]  Alexander J. Smola,et al.  Trend Filtering on Graphs , 2014, J. Mach. Learn. Res..

[36]  Shai Shalev-Shwartz,et al.  On Graduated Optimization for Stochastic Non-Convex Problems , 2015, ICML.

[37]  F. Santambrogio {Euclidean, metric, and Wasserstein} gradient flows: an overview , 2016, 1609.03890.

[38]  Pascal Bianchi,et al.  Ergodic Convergence of a Stochastic Proximal Point Algorithm , 2015, SIAM J. Optim..

[39]  W. Hachem,et al.  A constant step Forward-Backward algorithm involving random maximal monotone operators , 2017, 1702.04144.

[40]  I. Necoara,et al.  Nonasymptotic convergence of stochastic proximal point algorithms for constrained convex optimization , 2017, 1706.06297.

[41]  Umut Simsekli,et al.  Fractional Langevin Monte Carlo: Exploring Levy Driven Stochastic Differential Equations for Markov Chain Monte Carlo , 2017, ICML.

[42]  Ion Necoara,et al.  Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization , 2017, J. Mach. Learn. Res..

[43]  Espen Bernton,et al.  Langevin Monte Carlo and JKO splitting , 2018, COLT.

[44]  Michael I. Jordan,et al.  On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo , 2018, ICML.

[45]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[46]  Quanquan Gu,et al.  Subsampled Stochastic Variance-Reduced Gradient Langevin Dynamics , 2018, UAI.

[47]  Andre Wibisono,et al.  Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem , 2018, COLT.

[48]  Volkan Cevher,et al.  Mirrored Langevin Dynamics , 2018, NeurIPS.

[49]  Eric Moulines,et al.  Efficient Bayesian Computation by Proximal Markov Chain Monte Carlo: When Langevin Meets Moreau , 2016, SIAM J. Imaging Sci..

[50]  Arnak S. Dalalyan,et al.  On sampling from a log-concave density using kinetic Langevin diffusions , 2018, Bernoulli.

[51]  Michael I. Jordan,et al.  Is There an Analog of Nesterov Acceleration for MCMC? , 2019, ArXiv.

[52]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[53]  Alain Durmus,et al.  Analysis of Langevin Monte Carlo via Convex Optimization , 2018, J. Mach. Learn. Res..

[54]  Quanquan Gu,et al.  Sampling from Non-Log-Concave Distributions via Variance-Reduced Gradient Langevin Dynamics , 2019, AISTATS.

[55]  Pascal Bianchi,et al.  Snake: A Stochastic Proximal Gradient Algorithm for Regularized Problems Over Large Graphs , 2017, IEEE Transactions on Automatic Control.

[56]  Peter Richtárik,et al.  Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory , 2017, SIAM J. Matrix Anal. Appl..