Adaptive Accelerated Gradient Converging Method under H\"{o}lderian Error Bound Condition

Recent studies have shown that proximal gradient (PG) method and accelerated gradient method (APG) with restarting can enjoy a linear convergence under a weaker condition than strong convexity, namely a quadratic growth condition (QGC). However, the faster convergence of restarting APG method relies on the potentially unknown constant in QGC to appropriately restart APG, which restricts its applicability. We address this issue by developing a novel adaptive gradient converging methods, i.e., leveraging the magnitude of proximal gradient as a criterion for restart and termination. Our analysis extends to a much more general condition beyond the QGC, namely the H\"{o}lderian error bound (HEB) condition. {\it The key technique} for our development is a novel synthesis of {\it adaptive regularization and a conditional restarting scheme}, which extends previous work focusing on strongly convex problems to a much broader family of problems. Furthermore, we demonstrate that our results have important implication and applications in machine learning: (i) if the objective function is coercive and semi-algebraic, PG's convergence speed is essentially $o(\frac{1}{t})$, where $t$ is the total number of iterations; (ii) if the objective function consists of an $\ell_1$, $\ell_\infty$, $\ell_{1,\infty}$, or huber norm regularization and a convex smooth piecewise quadratic loss (e.g., squares loss, squared hinge loss and huber loss), the proposed algorithm is parameter-free and enjoys a {\it faster linear convergence} than PG without any other assumptions (e.g., restricted eigen-value condition). It is notable that our linear convergence results for the aforementioned problems are global instead of local. To the best of our knowledge, these improved results are the first shown in this work.

[1]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[2]  P. Tseng,et al.  On the linear convergence of descent methods for convex essentially smooth minimization , 1992 .

[3]  Dmitriy Drusvyatskiy,et al.  Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods , 2016, Math. Oper. Res..

[4]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[5]  Zheng Qu,et al.  Restarting accelerated gradient methods with a rough strong convexity estimate , 2016, 1609.07358.

[6]  Bruce W. Suter,et al.  From error bounds to the complexity of first-order descent methods for convex functions , 2015, Math. Program..

[7]  Hui Zhang,et al.  The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth , 2015, Optim. Lett..

[8]  Lin Xiao,et al.  A Proximal-Gradient Homotopy Method for the Sparse Least-Squares Problem , 2012, SIAM J. Optim..

[9]  Tianbao Yang,et al.  Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence , 2017, ICML.

[10]  Yurii Nesterov,et al.  Linear convergence of first order methods for non-strongly convex optimization , 2015, Math. Program..

[11]  Anthony Man-Cho So,et al.  Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity , 2013, Optim. Methods Softw..

[12]  Qi Zhang,et al.  \(\ell_{1, p}\)-Norm Regularization: Error Bounds and Convergence Rate Analysis of First-Order Methods , 2015, ICML.

[13]  Lin Xiao,et al.  An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization , 2014, Computational Optimization and Applications.

[14]  Guoyin Li,et al.  Global error bounds for piecewise convex polynomials , 2013, Math. Program..

[15]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[16]  Chih-Jen Lin,et al.  Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[17]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[18]  Emmanuel J. Candès,et al.  Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[19]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[20]  H. Nyquist The optimal Lp norm estimator in linear regression models , 1983 .

[21]  Marius Kloft,et al.  Huber-Norm Regularization for Linear Prediction Models , 2016, ECML/PKDD.

[22]  Zeyuan Allen-Zhu,et al.  How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD , 2018, NeurIPS.

[23]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[24]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[25]  E. Bierstone,et al.  Semianalytic and subanalytic sets , 1988 .

[26]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[27]  Tianbao Yang,et al.  RSG: Beating Subgradient Method without Smoothness and Strong Convexity , 2015, J. Mach. Learn. Res..

[28]  Jieping Ye,et al.  Linear Convergence of Variance-Reduced Projected Stochastic Gradient without Strong Convexity , 2014, ArXiv.

[29]  Zhi-Quan Luo,et al.  On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization , 2013, NIPS.

[30]  W. H. Yang Error Bounds for Convex Polynomials , 2009, SIAM J. Optim..

[31]  Anthony Man-Cho So,et al.  A unified approach to error bounds for structured convex optimization problems , 2015, Mathematical Programming.

[32]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[33]  Tianbao Yang,et al.  Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon) , 2016, NIPS.