Accelerated Stochastic Mirror Descent: From Continuous-time Dynamics to Discrete-time Algorithms

We present a new framework to analyze accelerated stochastic mirror descent through the lens of continuous-time stochastic dynamic systems. It enables us to design new algorithms, and perform a unified and simple analysis of the convergence rates of these algorithms. More specifically, under this framework, we provide a Lyapunov function based analysis for the continuous-time stochastic dynamics, as well as several new discrete-time algorithms derived from the continuous-time dynamics. We show that for general convex objective functions, the derived discrete-time algorithms attain the optimal convergence rate. Empirical experiments corroborate our theory.

[1]  Ashia C. Wilson,et al.  On Accelerated Methods in Optimization , 2015, 1509.03616.

[2]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[3]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[4]  Xi Chen,et al.  Optimal Regularized Dual Averaging Methods for Stochastic Optimization , 2012, NIPS.

[5]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[6]  Peter E. Kloeden,et al.  Applications of Stochastic Differential Equations , 1992 .

[7]  Maxim Raginsky,et al.  Continuous-time stochastic Mirror Descent on a network: Variance reduction, consensus, convergence , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[8]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[9]  Michael I. Jordan,et al.  A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.

[10]  Mathias Staudigl,et al.  On the convergence of gradient-like flows with noisy gradient input , 2016, SIAM J. Optim..

[11]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[12]  Marc Teboulle,et al.  Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..

[13]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[14]  Osman Güler,et al.  New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[15]  James T. Kwok,et al.  Accelerated Gradient Methods for Stochastic Optimization and Online Learning , 2009, NIPS.

[16]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[17]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[18]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[19]  Bin Hu,et al.  Dissipativity Theory for Nesterov's Accelerated Method , 2017, ICML.

[20]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[21]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[22]  Jelena Diakonikolas,et al.  Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method , 2017, ITCS.

[23]  H. Robbins A Stochastic Approximation Method , 1951 .

[24]  Edward Chlebus,et al.  An approximate formula for a partial sum of the divergent p-series , 2009, Appl. Math. Lett..

[25]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[26]  Mohit Singh,et al.  A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[27]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[28]  Boris Polyak Gradient methods for the minimisation of functionals , 1963 .

[29]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[30]  Peter Richtárik,et al.  Smooth minimization of nonsmooth functions with parallel coordinate descent methods , 2013, Modeling and Optimization: Theory and Applications.

[31]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[32]  Peter L. Bartlett,et al.  Acceleration and Averaging in Stochastic Mirror Descent Dynamics , 2017, 1707.06219.

[33]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[34]  B. Øksendal Stochastic Differential Equations , 1985 .

[35]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .