Accelerating incremental gradient optimization with curvature information

This paper studies an acceleration technique for incremental aggregated gradient ( IAG ) method through the use of curvature information for solving strongly convex finite sum optimization problems. These optimization problems of interest arise in large-scale learning applications. Our technique utilizes a curvature-aided gradient tracking step to produce accurate gradient estimates incrementally using Hessian information. We propose and analyze two methods utilizing the new technique, the curvature-aided IAG ( CIAG ) method and the accelerated CIAG ( A-CIAG ) method, which are analogous to gradient method and Nesterov’s accelerated gradient method, respectively. Setting $$\kappa$$ κ to be the condition number of the objective function, we prove the R linear convergence rates of $$1 - \frac{4c_0 \kappa }{(\kappa +1)^2}$$ 1 - 4 c 0 κ ( κ + 1 ) 2 for the CIAG method, and $$1 - \sqrt{\frac{c_1}{2\kappa }}$$ 1 - c 1 2 κ for the A-CIAG method, where $$c_0,c_1 \le 1$$ c 0 , c 1 ≤ 1 are constants inversely proportional to the distance between the initial point and the optimal solution. When the initial iterate is close to the optimal solution, the R linear convergence rates match with the gradient and accelerated gradient method, albeit CIAG and A-CIAG operate in an incremental setting with strictly lower computation complexity. Numerical experiments confirm our findings. The source codes used for this paper can be found on http://github.com/hoitowai/ciag/ .

[1]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[2]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[3]  D. Bertsekas,et al.  Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[4]  Dimitri P. Bertsekas,et al.  Incremental Subgradient Methods for Nondifferentiable Optimization , 2001, SIAM J. Optim..

[5]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[6]  Alfred O. Hero,et al.  A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..

[7]  H. Robbins A Stochastic Approximation Method , 1951 .

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[10]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[11]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[12]  Hamid Reza Feyzmahdavian,et al.  A delayed proximal gradient method with linear convergence rate , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[13]  Atsushi Nitanda,et al.  Stochastic Proximal Gradient Descent with Acceleration Techniques , 2014, NIPS.

[14]  Asuman E. Ozdaglar,et al.  A globally convergent incremental Newton method , 2014, Math. Program..

[15]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[16]  Léon Bottou,et al.  A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[17]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[18]  Dimitri P. Bertsekas,et al.  Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.

[19]  A. Ozdaglar,et al.  A Stronger Convergence Result on the Proximal Incremental Aggregated Gradient Method , 2016, 1611.08022.

[20]  Anton Rodomanov,et al.  A Superlinearly-Convergent Proximal Newton-type Method for the Optimization of Finite Sums , 2016, ICML.

[21]  Ohad Shamir,et al.  Dimension-Free Iteration Complexity of Finite Sum Optimization Problems , 2016, NIPS.

[22]  Asuman E. Ozdaglar,et al.  Global convergence rate of incremental aggregated gradient methods for nonsmooth problems , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[23]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[24]  Anthony Man-Cho So,et al.  Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity , 2013, Optim. Methods Softw..

[25]  Asuman E. Ozdaglar,et al.  On the Convergence Rate of Incremental Aggregated Gradient Algorithms , 2015, SIAM J. Optim..

[26]  Wei Shi,et al.  Curvature-aided incremental aggregated gradient method , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27]  Nenghai Yu,et al.  Asynchronous Stochastic Gradient Descent with Delay Compensation , 2016, ICML.

[28]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[29]  Aryan Mokhtari,et al.  IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate , 2017, SIAM J. Optim..

[30]  Nicolas Le Roux,et al.  Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods , 2017, AISTATS.

[31]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[32]  Asuman E. Ozdaglar,et al.  Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.