Accelerated Stochastic Power Iteration

Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, requires O ( 1 / Δ ) full-data passes to recover the principal component of a matrix with eigen-gap Δ. Lanczos, a significantly more complex method, achieves an accelerated rate of O ( 1 / Δ ) passes. Modern applications, however, motivate methods that only ingest a subset of available data, known as the stochastic setting. In the online stochastic setting, simple algorithms like Oja's iteration achieve the optimal sample complexity O ( σ 2 / Δ 2 ) . Unfortunately, they are fully sequential, and also require O ( σ 2 / Δ 2 ) iterations, far from the O ( 1 / Δ ) rate of Lanczos. We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity. In the full-pass setting, standard analysis shows that momentum achieves the accelerated rate, O ( 1 / Δ ) . We demonstrate empirically that naively applying momentum to a stochastic method, does not result in acceleration. We perform a novel, tight variance analysis that reveals the "breaking-point variance" beyond which this acceleration does not occur. By combining this insight with modern variance reduction techniques, we construct stochastic PCA algorithms, for the online and offline setting, that achieve an accelerated iteration complexity O ( 1 / Δ ) . Due to the embarassingly parallel nature of our methods, this acceleration translates directly to wall-clock time if deployed in a parallel environment. Our approach is very general, and applies to many non-convex optimization problems that can now be accelerated using the same technique.

[1]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[2]  R. Varga,et al.  Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods , 1961 .

[3]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[4]  P. Bonacich Factoring and weighting approaches to status scores and clique identification , 1972 .

[5]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[6]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[7]  Gene H. Golub,et al.  Matrix computations , 1983 .

[8]  I. Jolliffe Principal Component Analysis , 2002 .

[9]  M. Anshelevich,et al.  Introduction to orthogonal polynomials , 2003 .

[10]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[11]  R. Vershynin,et al.  A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.

[12]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[13]  한성민,et al.  3 , 1823, Dance for Me When I Die.

[14]  Robert D. Nowak,et al.  Online identification and tracking of subspaces from highly incomplete information , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  Ohad Shamir,et al.  Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.

[16]  Chris Arney Network Analysis: Methodological Foundations , 2012 .

[17]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[18]  Sanjoy Dasgupta,et al.  The Fast Convergence of Incremental PCA , 2013, NIPS.

[19]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[20]  Ioannis Mitliagkas,et al.  Memory Limited, Streaming PCA , 2013, NIPS.

[21]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[22]  Moritz Hardt,et al.  The Noisy Power Method: A Meta Algorithm with Applications , 2013, NIPS.

[23]  Cameron Musco,et al.  Randomized Block Krylov Methods for Stronger and Faster Approximate Singular Value Decomposition , 2015, NIPS.

[24]  Ohad Shamir,et al.  A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate , 2014, ICML.

[25]  Peter Richtárik,et al.  Randomized Iterative Methods for Linear Systems , 2015, SIAM J. Matrix Anal. Appl..

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Le Song,et al.  Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients , 2015, NIPS.

[28]  Christopher De Sa,et al.  Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems , 2014, ICML.

[29]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[30]  Christos Boutsidis,et al.  Online Principal Components Analysis , 2015, SODA.

[31]  Prateek Jain,et al.  Matching Matrix Bernstein with Little Memory: Near-Optimal Finite Sample Guarantees for Oja's Algorithm , 2016, ArXiv.

[32]  Prateek Jain,et al.  Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm , 2016, COLT.

[33]  Alexandre d'Aspremont,et al.  Regularized nonlinear acceleration , 2016, Mathematical Programming.

[34]  Yuanzhi Li,et al.  Even Faster SVD Decomposition Yet Without Agonizing Pain , 2016, NIPS.

[35]  Sham M. Kakade,et al.  Faster Eigenvector Computation via Shift-and-Invert Preconditioning , 2016, ICML.

[36]  Ohad Shamir,et al.  Convergence of Stochastic Gradient Descent for PCA , 2015, ICML.

[37]  Yuanzhi Li,et al.  First Efficient Convergence for Streaming k-PCA: A Global, Gap-Free, and Near-Optimal Rate , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[38]  Gabriel Goh,et al.  Why Momentum Really Works , 2017 .

[39]  Prateek Jain,et al.  Accelerating Stochastic Gradient Descent , 2017, ArXiv.

[40]  Yuanzhi Li,et al.  Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition , 2016, ICML.

[41]  Prateek Jain,et al.  Accelerating Stochastic Gradient Descent , 2017, COLT.

[42]  Bae 50 % , 2018, CME.

[43]  Tong Zhang,et al.  Near-optimal stochastic approximation for online principal component estimation , 2016, Math. Program..