论文信息 - Second-Order Stochastic Optimization for Machine Learning in Linear Time - 字舞流文

Second-Order Stochastic Optimization for Machine Learning in Linear Time

First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored due to the high cost of computing the second-order information. In this paper we develop second-order stochastic methods for optimization problems in machine learning that match the per-iteration cost of gradient based methods, and in certain settings improve upon the overall running time over popular first-order methods. Furthermore, our algorithm has the desirable property of being implementable in time linear in the sparsity of the input data.

Naman Agarwal | Elad Hazan | Brian Bullins | Elad Hazan | Naman Agarwal | Brian Bullins

[1] R. Fletcher,et al. A New Approach to Variable Metric Algorithms , 1970, Comput. J..

[2] C. G. Broyden. The Convergence of a Class of Double-rank Minimization Algorithms 2. The New Algorithm , 1970 .

[3] D. Shanno. Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[4] D. Goldfarb. A family of variable-metric methods derived by variational means , 1970 .

[5] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[6] Denis J. Dean,et al. Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables , 1999 .

[7] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[8] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[9] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[10] H. Robbins. A Stochastic Approximation Method , 1951 .

[11] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[12] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[13] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[14] Jorge Nocedal,et al. On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning , 2011, SIAM J. Optim..

[15] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[16] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[17] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[18] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[19] Gary L. Miller,et al. Iterative Row Sampling , 2012, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[20] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[21] Rong Jin,et al. Linear Convergence with Condition Number Independent Access of Full Gradients , 2013, NIPS.

[22] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[23] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[24] Aryan Mokhtari,et al. RES: Regularized Stochastic BFGS Algorithm , 2014, IEEE Transactions on Signal Processing.

[25] Lin Xiao,et al. An Accelerated Proximal Coordinate Gradient Method , 2014, NIPS.

[26] Andrea Montanari,et al. Convergence rates of sub-sampled Newton methods , 2015, NIPS.

[27] Richard Peng,et al. Uniform Sampling for Matrix Approximation , 2014, ITCS.

[28] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[29] Elad Hazan,et al. Fast and Simple PCA via Convex Optimization , 2015, ArXiv.

[30] Michael I. Jordan,et al. A Linearly-Convergent Stochastic L-BFGS Algorithm , 2015, AISTATS.

[31] Zeyuan Allen-Zhu. Katyusha: Accelerated Variance Reduction for Faster SGD , 2016 .

[32] Ohad Shamir,et al. Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity , 2015, ICML.

[33] Haipeng Luo,et al. Efficient Second Order Online Learning by Sketching , 2016, NIPS.

[34] Michael B. Cohen,et al. Nearly Tight Oblivious Subspace Embeddings by Trace Inequalities , 2016, SODA.

[35] J. Nocedal,et al. Exact and Inexact Subsampled Newton Methods for Optimization , 2016, 1609.08502.

[36] Zeyuan Allen Zhu,et al. Katyusha: Accelerated Variance Reduction for Faster SGD , 2016, ArXiv.

[37] Jorge Nocedal,et al. A Stochastic Quasi-Newton Method for Large-Scale Optimization , 2014, SIAM J. Optim..

[38] Peng Xu,et al. Sub-sampled Newton Methods with Non-uniform Sampling , 2016, NIPS.

[39] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[40] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[41] Ohad Shamir,et al. Oracle Complexity of Second-Order Methods for Finite-Sum Problems , 2016, ICML.

[42] Martin J. Wainwright,et al. Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence , 2015, SIAM J. Optim..

[43] Haishan Ye,et al. A Unifying Framework for Convergence Analysis of Approximate Newton Methods , 2017, ArXiv.

[44] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.