Nestrov's Acceleration For Second Order Method

Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted much attention due to their low computational cost in each iteration. However, these algorithms might perform poorly especially if it is hard to approximate the Hessian well and efficiently. As far as we know, there is no effective way to handle this problem. In this paper, we resort to Nestrov's acceleration technique to improve the convergence performance of a class of second-order methods called approximate Newton. We give a theoretical analysis that Nestrov's acceleration technique can improve the convergence performance for approximate Newton just like for first-order methods. We accordingly propose an accelerated regularized sub-sampled Newton. Our accelerated algorithm performs much better than the original regularized sub-sampled Newton in experiments, which validates our theory empirically. Besides, the accelerated regularized sub-sampled Newton has good performance comparable to or even better than state-of-art algorithms.

[1]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[2]  Ohad Shamir,et al.  Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.

[3]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[4]  Andrea Montanari,et al.  Convergence rates of sub-sampled Newton methods , 2015, NIPS.

[5]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[6]  Alexander J. Smola,et al.  Efficient mini-batch training for stochastic optimization , 2014, KDD.

[7]  Peng Xu,et al.  Sub-sampled Newton Methods with Non-uniform Sampling , 2016, NIPS.

[8]  Jorge Nocedal,et al.  On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning , 2011, SIAM J. Optim..

[9]  Haishan Ye,et al.  A Unifying Framework for Convergence Analysis of Approximate Newton Methods , 2017, ArXiv.

[10]  Martin J. Wainwright,et al.  Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence , 2015, SIAM J. Optim..

[11]  Michael W. Mahoney,et al.  Sub-Sampled Newton Methods II: Local Convergence Rates , 2016, ArXiv.

[12]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[13]  Zeyuan Allen Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[14]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[15]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[16]  H. Robbins A Stochastic Approximation Method , 1951 .

[17]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.