论文信息 - Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the \L{}ojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older Polyak-\L{}ojasiewicz (PL) inequality is actually weaker than the main conditions that have been explored to show linear convergence rates without strong convexity over the last 25 years. We also use the PL inequality to give new analyses of randomized and greedy coordinate descent methods, sign-based gradient descent methods, and stochastic gradient methods in the classic setting (with decreasing or constant step-sizes) as well as the variance-reduced setting. We further propose a generalization that applies to proximal-gradient methods for non-smooth optimization, leading to simple proofs of linear convergence of these methods. Along the way, we give simple convergence results for a wide variety of problems in machine learning: least squares, logistic regression, boosting, resilient backpropagation, L1-regularization, support vector machines, stochastic dual coordinate ascent, and stochastic variance-reduced gradient methods.

Mark W. Schmidt | Hamed Karimi | Julie Nutini | H. Karimi | J. Nutini

[1] A. Hoffman. On approximate solutions of systems of linear inequalities , 1952 .

[2] N. Wormald. A 4-chromatic graph with a special plane drawing , 1979 .

[3] M. A. Hanson. On sufficiency of the Kuhn-Tucker conditions , 1981 .

[4] B. M. Glover,et al. Invex functions and duality , 1985, Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics.

[5] Adi Ben-Israel,et al. What is invexity? , 1986, The Journal of the Australian Mathematical Society. Series B. Applied Mathematics.

[6] Martin A. Riedmiller,et al. RPROP - A Fast Adaptive Learning Algorithm , 1992 .

[7] Paul Tseng,et al. Error Bound and Convergence Analysis of Matrix Splitting Algorithms for the Affine Variational Inequality Problem , 1992, SIAM J. Optim..

[8] Z.-Q. Luo,et al. Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[9] K. Kurdyka. On gradients of functions definable in o-minimal structures , 1998 .

[10] Mihai Anitescu,et al. Degenerate Nonlinear Programming with a Quadratic Growth Condition , 1999, SIAM J. Optim..

[11] Gunnar Rätsch,et al. An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[12] Dimitri P. Bertsekas,et al. Convex Analysis and Optimization , 2003 .

[13] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[14] Don R. Hush,et al. QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines , 2006, J. Mach. Learn. Res..

[15] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[16] J. Bolte,et al. Characterizations of Lojasiewicz inequalities and applications , 2008, 0802.0826.