Convergence rates of an inertial gradient descent algorithm under growth and flatness conditions

In this paper we study the convergence properties of a Nesterov’s family of inertial schemes which is a specific case of inertial Gradient Descent algorithm in the context of a smooth convex minimization problem, under some additional hypotheses on the local geometry of the objective function F, such as the growth (or Łojasiewicz) condition. In particular we study the different convergence rates for the objective function and the local variation, depending on these geometric conditions. In this setting we can give optimal convergence rates for this Nesterov scheme. Our analysis shows that there are some situations when Nesterov’s family of inertial schemes is asymptotically less efficient than the gradient descent (e.g. in the case when the objective function is quadratic).

[1]  Alexander Y. Kruger,et al.  Error Bounds and Hölder Metric Subregularity , 2014, 1411.6414.

[2]  Jean-François Aujol,et al.  The Differential Inclusion Modeling FISTA Algorithm and Optimality of Convergence Rate in the Case b $\leq3$ , 2018, SIAM J. Optim..

[3]  A. Chambolle,et al.  On the Convergence of the Iterates of the “Fast Iterative Shrinkage/Thresholding Algorithm” , 2015, J. Optim. Theory Appl..

[4]  Morgan Pierre CONVERGENCE TO EQUILIBRIUM FOR THE BACKWARD EULER SCHEME AND APPLICATIONS , 2010 .

[5]  Ramzi May Asymptotic for a second order evolution equation with convex potential and vanishing damping term , 2015, 1509.05598.

[6]  Mohamed-Jalal Fadili,et al.  Activity Identification and Local Linear Convergence of Forward-Backward-type Methods , 2015, SIAM J. Optim..

[7]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[8]  Zheng Qu,et al.  Adaptive restart of accelerated gradient methods under local quadratic growth condition , 2017, IMA Journal of Numerical Analysis.

[9]  Antonin Chambolle,et al.  Backtracking Strategies for Accelerated Descent Methods with Smooth Composite Objectives , 2017, SIAM J. Optim..

[10]  Juan Peypouquet,et al.  Splitting Methods with Variable Metric for Kurdyka–Łojasiewicz Functions and General Convergence Rates , 2015, J. Optim. Theory Appl..

[11]  Jean-François Aujol,et al.  Optimal Convergence Rates for Nesterov Acceleration , 2018, SIAM J. Optim..

[12]  Yurii Nesterov,et al.  Linear convergence of first order methods for non-strongly convex optimization , 2015, Math. Program..

[13]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[14]  Osman Güler,et al.  New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[15]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[16]  H. Attouch,et al.  Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3 , 2017, ESAIM: Control, Optimisation and Calculus of Variations.

[17]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[18]  Jérôme Malick,et al.  On the Proximal Gradient Algorithm with Alternated Inertia , 2018, Journal of Optimization Theory and Applications.

[19]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[20]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[21]  Zheng Qu,et al.  Restarting accelerated gradient methods with a rough strong convexity estimate , 2016, 1609.07358.

[22]  Bruce W. Suter,et al.  From error bounds to the complexity of first-order descent methods for convex functions , 2015, Math. Program..

[23]  A. Nemirovskii,et al.  Optimal methods of smooth convex minimization , 1986 .

[24]  Hedy Attouch,et al.  The Rate of Convergence of Nesterov's Accelerated Forward-Backward Method is Actually Faster Than 1/k2 , 2015, SIAM J. Optim..

[25]  BolteJérôme,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems , 2010 .

[26]  Boris Polyak Gradient methods for the minimisation of functionals , 1963 .

[27]  Juan Peypouquet,et al.  Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity , 2018, Math. Program..

[28]  Alexandre d'Aspremont,et al.  Integration Methods and Optimization Algorithms , 2017, NIPS.

[29]  L. Rosasco,et al.  Convergence of the forward-backward algorithm: beyond the worst-case with the help of geometry , 2017, Mathematical Programming.

[30]  Emmanuel J. Candès,et al.  Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[31]  Jean-François Aujol,et al.  Convergence rate of inertial Forward–Backward algorithm beyond Nesterov’s rule , 2018, Mathematical Programming.

[32]  Mingrui Liu,et al.  Adaptive Accelerated Gradient Converging Method under H\"{o}lderian Error Bound Condition , 2016, NIPS.

[33]  S. Łojasiewicz Sur la géométrie semi- et sous- analytique , 1993 .

[34]  Hédy Attouch,et al.  Convergence Rates of Inertial Forward-Backward Algorithms , 2018, SIAM J. Optim..

[35]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[36]  Alexandre d'Aspremont,et al.  Sharpness, Restart and Acceleration , 2017 .

[37]  Dmitriy Drusvyatskiy,et al.  Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods , 2016, Math. Oper. Res..

[38]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[39]  Hédy Attouch,et al.  On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , 2008, Math. Program..

[40]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[41]  J. Bolte,et al.  Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity , 2009 .

[42]  S. Gadat,et al.  On the long time behavior of second order differential equations with asymptotically small dissipation , 2007, 0710.1107.

[43]  Adrian S. Lewis,et al.  The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..

[44]  C. Dossal,et al.  Optimal rate of convergence of an ODE associated to the Fast Gradient Descent schemes for b>0 , 2017 .