Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

In this paper, a new theory is developed for firstorder stochastic convex optimization, showing that the global convergence rate is sufficiently quantified by a local growth rate of the objective function in a neighborhood of the optimal solutions. In particular, if the objective function F (w) in the -sublevel set grows as fast as ‖w − w∗‖ 2 , where w∗ represents the closest optimal solution to w and θ ∈ (0, 1] quantifies the local growth rate, the iteration complexity of first-order stochastic optimization for achieving an -optimal solution can be Õ(1/ 2(1−θ)), which is optimal at most up to a logarithmic factor. To achieve the faster global convergence, we develop two different accelerated stochastic subgradient methods by iteratively solving the original problem approximately in a local region around a historical solution with the size of the local region gradually decreasing as the solution approaches the optimal set. Besides the theoretical improvements, this work also include new contributions towards making the proposed algorithms practical: (i) we present practical variants of accelerated stochastic subgradient methods that can run without the knowledge of multiplicative growth constant and even the growth rate θ; (ii) we consider a broad family of problems in machine learning to demonstrate that the proposed algorithms enjoy faster convergence than traditional stochastic subgradient method. For example, when applied to the `1 regularized empirical polyhedral loss minimization (e.g., hinge loss, absolute loss), the proposed stochastic methods have a logarithmic iteration complexity. Department of Computer Science, The University of Iowa, Iowa City, IA 52242, USA Department of Management Sciences, The University of Iowa, Iowa City, IA 52242, USA. Correspondence to: Tianbao Yang <tianbao-yang@uiowa.edu>. Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s).

[1]  Adrian S. Lewis,et al.  The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..

[2]  Stephen J. Wright,et al.  An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[3]  Hui Zhang,et al.  Gradient methods for convex minimization: better rates under weaker conditions , 2013, ArXiv.

[4]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[5]  A. Juditsky,et al.  Deterministic and Stochastic Primal-Dual Subgradient Algorithms for Uniformly Convex Minimization , 2014 .

[6]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[7]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[8]  Yurii Nesterov,et al.  Linear convergence of first order methods for non-strongly convex optimization , 2015, Math. Program..

[9]  Ambuj Tewari,et al.  On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[10]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[11]  P. Tseng,et al.  On the linear convergence of descent methods for convex essentially smooth minimization , 1992 .

[12]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[13]  H. Nyquist The optimal Lp norm estimator in linear regression models , 1983 .

[14]  Bruce W. Suter,et al.  From error bounds to the complexity of first-order descent methods for convex functions , 2015, Math. Program..

[15]  Anthony Man-Cho So,et al.  A unified approach to error bounds for structured convex optimization problems , 2015, Mathematical Programming.

[16]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[17]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[18]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[19]  Huan Xu,et al.  Fast Rate Analysis of Some Stochastic Optimization Algorithms , 2016, ICML.

[20]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[21]  Tianbao Yang,et al.  RSG: Beating SGD without Smoothness and/or Strong Convexity , 2015 .

[22]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[23]  Rong Jin,et al.  Linear Convergence with Condition Number Independent Access of Full Gradients , 2013, NIPS.

[24]  Marius Kloft,et al.  Huber-Norm Regularization for Linear Prediction Models , 2016, ECML/PKDD.

[25]  Tianbao Yang,et al.  An efficient primal dual prox method for non-smooth optimization , 2014, Machine Learning.

[26]  Mingrui Liu,et al.  Adaptive Accelerated Gradient Converging Method under H\"{o}lderian Error Bound Condition , 2016, NIPS.

[27]  Guoyin Li,et al.  Global error bounds for piecewise convex polynomials , 2013, Math. Program..

[28]  Zhi-Quan Luo,et al.  On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization , 2013, NIPS.

[29]  Stephen J. Wright,et al.  Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[30]  Chih-Jen Lin,et al.  Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[31]  Aarti Singh,et al.  Optimal rates for stochastic convex optimization under Tsybakov noise condition , 2013, ICML.

[32]  Qi Zhang,et al.  \(\ell_{1, p}\)-Norm Regularization: Error Bounds and Convergence Rate Analysis of First-Order Methods , 2015, ICML.

[33]  Guoyin Li,et al.  Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods , 2016, Foundations of Computational Mathematics.

[34]  Benar Fux Svaiter,et al.  Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[35]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..

[36]  R. Rockafellar,et al.  Local strong convexity and local Lipschitz continuity of the gradient of convex functions , 2007 .

[37]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[38]  Tianbao Yang,et al.  Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon) , 2016, NIPS.

[39]  Zeyuan Allen Zhu,et al.  Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[40]  Jieping Ye,et al.  Linear Convergence of Variance-Reduced Projected Stochastic Gradient without Strong Convexity , 2014, ArXiv.