论文信息 - Accelerated Stochastic Mirror Descent: From Continuous-time Dynamics to Discrete-time Algorithms

Accelerated Stochastic Mirror Descent: From Continuous-time Dynamics to Discrete-time Algorithms

We present a new framework to analyze accelerated stochastic mirror descent through the lens of continuous-time stochastic dynamic systems. It enables us to design new algorithms, and perform a unified and simple analysis of the convergence rates of these algorithms. More specifically, under this framework, we provide a Lyapunov function based analysis for the continuous-time stochastic dynamics, as well as several new discrete-time algorithms derived from the continuous-time dynamics. We show that for general convex objective functions, the derived discrete-time algorithms attain the optimal convergence rate. Empirical experiments corroborate our theory.

[1] Ashia C. Wilson,et al. On Accelerated Methods in Optimization , 2015, 1509.03616.

[2] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[3] David G. Luenberger,et al. Linear and nonlinear programming , 1984 .

[4] Xi Chen,et al. Optimal Regularized Dual Averaging Methods for Stochastic Optimization , 2012, NIPS.

[5] Alexandre M. Bayen,et al. Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[6] Peter E. Kloeden,et al. Applications of Stochastic Differential Equations , 1992 .

[7] Maxim Raginsky,et al. Continuous-time stochastic Mirror Descent on a network: Variance reduction, consensus, convergence , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[8] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[9] Michael I. Jordan,et al. A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.

[10] Mathias Staudigl,et al. On the convergence of gradient-like flows with noisy gradient input , 2016, SIAM J. Optim..

[11] Benjamin Recht,et al. Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[12] Marc Teboulle,et al. Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..

[13] Guanghui Lan,et al. An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[14] Osman Güler,et al. New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[15] James T. Kwok,et al. Accelerated Gradient Methods for Stochastic Optimization and Online Learning , 2009, NIPS.

[16] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[17] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[18] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[19] Bin Hu,et al. Dissipativity Theory for Nesterov's Accelerated Method , 2017, ICML.

[20] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[21] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[22] Jelena Diakonikolas,et al. Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method , 2017, ITCS.

[23] H. Robbins. A Stochastic Approximation Method , 1951 .

[24] Edward Chlebus,et al. An approximate formula for a partial sum of the divergent p-series , 2009, Appl. Math. Lett..

[25] Yi Zhou,et al. An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[26] Mohit Singh,et al. A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[27] Andre Wibisono,et al. A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[28] Boris Polyak. Gradient methods for the minimisation of functionals , 1963 .

[29] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[30] Peter Richtárik,et al. Smooth minimization of nonsmooth functions with parallel coordinate descent methods , 2013, Modeling and Optimization: Theory and Applications.

[31] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[32] Peter L. Bartlett,et al. Acceleration and Averaging in Stochastic Mirror Descent Dynamics , 2017, 1707.06219.

[33] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[34] B. Øksendal. Stochastic Differential Equations , 1985 .

[35] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .