论文信息 - How To Make the Gradients Small Stochastically

How To Make the Gradients Small Stochastically

In convex stochastic optimization, convergence rates in terms of minimizing the objective have been well-established. However, in terms of making the gradients small, the best known convergence rate was $O(\varepsilon^{-8/3})$ and it was left open how to improve it. In this paper, we improve this rate to $\tilde{O}(\varepsilon^{-2})$, which is optimal up to log factors.

Zeyuan Allen-Zhu | Zeyuan Allen-Zhu

[1] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[2] Nathan Srebro,et al. Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[3] Zeyuan Allen Zhu,et al. Optimal Black-Box Reductions Between Optimization Objectives , 2016, NIPS.

[4] Avi Wigderson,et al. Much Faster Algorithms for Matrix Scaling , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[5] Aleksander Madry,et al. Matrix Scaling and Balancing via Box Constrained Newton's Method and Interior Point Methods , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[6] Zeyuan Allen-Zhu,et al. How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD , 2018, NeurIPS.

[7] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[8] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[9] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[10] Zeyuan Allen-Zhu,et al. Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.

[11] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[12] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[13] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[14] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[15] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[16] Michael I. Jordan,et al. Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[17] Zeyuan Allen Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[18] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[19] O. SIAMJ.,et al. PROX-METHOD WITH RATE OF CONVERGENCE O(1/t) FOR VARIATIONAL INEQUALITIES WITH LIPSCHITZ CONTINUOUS MONOTONE OPERATORS AND SMOOTH CONVEX-CONCAVE SADDLE POINT PROBLEMS∗ , 2004 .

[20] Yoram Singer,et al. Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..