How To Make the Gradients Small Stochastically

In convex stochastic optimization, convergence rates in terms of minimizing the objective have been well-established. However, in terms of making the gradients small, the best known convergence rate was $O(\varepsilon^{-8/3})$ and it was left open how to improve it. In this paper, we improve this rate to $\tilde{O}(\varepsilon^{-2})$, which is optimal up to log factors.

[1]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[2]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[3]  Zeyuan Allen Zhu,et al.  Optimal Black-Box Reductions Between Optimization Objectives , 2016, NIPS.

[4]  Avi Wigderson,et al.  Much Faster Algorithms for Matrix Scaling , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[5]  Aleksander Madry,et al.  Matrix Scaling and Balancing via Box Constrained Newton's Method and Interior Point Methods , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[6]  Zeyuan Allen-Zhu,et al.  How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD , 2018, NeurIPS.

[7]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[8]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[9]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[10]  Zeyuan Allen-Zhu,et al.  Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.

[11]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[12]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[13]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[14]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[15]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[16]  Michael I. Jordan,et al.  Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[17]  Zeyuan Allen Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[18]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[19]  O. SIAMJ.,et al.  PROX-METHOD WITH RATE OF CONVERGENCE O(1/t) FOR VARIATIONAL INEQUALITIES WITH LIPSCHITZ CONTINUOUS MONOTONE OPERATORS AND SMOOTH CONVEX-CONCAVE SADDLE POINT PROBLEMS∗ , 2004 .

[20]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..