Curvature-aided incremental aggregated gradient method

We propose a new algorithm for finite sum optimization which we call the curvature-aided incremental aggregated gradient (CIAG) method. Motivated by the problem of training a classifier for a d-dimensional problem, where the number of training data is m and m ≫ d ≫ 1, the CIAG method seeks to accelerate incremental aggregated gradient (IAG) methods using aids from the curvature (or Hessian) information, while avoiding the evaluation of matrix inverses required by the incremental Newton (IN) method. Specifically, our idea is to exploit the incrementally aggregated Hessian matrix to trace the full gradient vector at every incremental step, therefore achieving an improved linear convergence rate over the state-of-the-art IAG methods. For strongly convex problems, the fast linear convergence rate requires the objective function to be close to quadratic, or the initial point to be close to optimal solution. Importantly, we show that running one iteration of the CIAG method yields the same improvement to the optimality gap as running one iteration of the full gradient method, while the complexity is O(d2) for CIAG and O(md) for the full gradient. Overall, the CIAG method strikes a balance between the high computation complexity incremental Newtontype methods and the slow IAG method. Our numerical results support the theoretical findings and show that the CIAG method often converges with much fewer iterations than IAG, and requires much shorter running time than IN when the problem dimension is high.

[1]  Asuman E. Ozdaglar,et al.  Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.

[2]  Aryan Mokhtari,et al.  IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate , 2017, SIAM J. Optim..

[3]  Asuman E. Ozdaglar,et al.  A globally convergent incremental Newton method , 2014, Math. Program..

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  H. Robbins A Stochastic Approximation Method , 1951 .

[6]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[7]  Anthony Man-Cho So,et al.  Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity , 2013, Optim. Methods Softw..

[8]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[9]  Asuman E. Ozdaglar,et al.  On the Convergence Rate of Incremental Aggregated Gradient Algorithms , 2015, SIAM J. Optim..

[10]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[11]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[12]  A. Ozdaglar,et al.  A Stronger Convergence Result on the Proximal Incremental Aggregated Gradient Method , 2016, 1611.08022.

[13]  Aryan Mokhtari,et al.  Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate , 2016, SIAM J. Optim..

[14]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[15]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[16]  Vivek S. Borkar,et al.  Distributed Asynchronous Incremental Subgradient Methods , 2001 .

[17]  Ohad Shamir,et al.  Dimension-Free Iteration Complexity of Finite Sum Optimization Problems , 2016, NIPS.

[18]  Hamid Reza Feyzmahdavian,et al.  Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server , 2016, ArXiv.

[19]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[20]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[21]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[22]  Anton Rodomanov,et al.  A Superlinearly-Convergent Proximal Newton-type Method for the Optimization of Finite Sums , 2016, ICML.

[23]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[24]  Hamid Reza Feyzmahdavian,et al.  A delayed proximal gradient method with linear convergence rate , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[25]  Dimitri P. Bertsekas,et al.  Incremental Subgradient Methods for Nondifferentiable Optimization , 2001, SIAM J. Optim..

[26]  Dimitri P. Bertsekas,et al.  Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.

[27]  D. Bertsekas,et al.  Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[28]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[29]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[30]  Atsushi Nitanda,et al.  Stochastic Proximal Gradient Descent with Acceleration Techniques , 2014, NIPS.

[31]  Alfred O. Hero,et al.  A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..