Matching Matrix Bernstein with Little Memory: Near-Optimal Finite Sample Guarantees for Oja's Algorithm

This work provides improved guarantees for streaming princ iple component analysis (PCA). Given A1, . . . ,An ∈ R sampled independently from distributions satisfying E [Ai] = Σ for Σ 0, this work provides anO(d)-space linear-time single-pass streaming algorithm for es timating the top eigenvector ofΣ. The algorithm nearly matches (and in certain cases improve s upon) the accuracy obtained by the standard batch method that computes top eigenvector o f he empirical covariance n ∑ i∈[n] Ai as analyzed by the matrix Bernstein inequality. Moreover, t o achieve constant accuracy, our algorithm improves upon the best previous known sample complexities o f streaming algorithms by either a multiplicative factor ofO(d) or 1/gap wheregap is the relative distance between the top two eigenvalues of Σ. These results are achieved through a novel analysis of the cl assi Oja’s algorithm, one of the oldest and most popular algorithms for streaming PCA. In particula r, this work shows that simply picking a random initial pointw0 and applying the update rule wi+1 = wi + ηiAiwi suffices to accurately estimate the top eigenvector, with a suitable choice of ηi. We believe our result sheds light on how to efficiently perform streaming PCA both in theory and in pract ice, and we hope that our analysis may serve as the basis for analyzing many variants and extension s of streaming PCA. Microsoft Research India. Email: prajain@microsoft.com UC Berkeley. Email: chijin@cs.berkeley.edu University of Washington. Email: sham@cs.washington.edu Microsoft Research New England. Email: praneeth@microsof t.com Microsoft Research New England. Email: asid@microsoft.co m

[1]  Ohad Shamir,et al.  A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate , 2014, ICML.

[2]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[3]  Ioannis Mitliagkas,et al.  Memory Limited, Streaming PCA , 2013, NIPS.

[4]  Elad Hazan,et al.  Fast and Simple PCA via Convex Optimization , 2015, ArXiv.

[5]  Juyang Weng,et al.  Candid Covariance-Free Incremental Principal Component Analysis , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[7]  Manfred K. Warmuth,et al.  Randomized PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2006, NIPS.

[8]  Jakub W. Pachocki,et al.  Geometric median in nearly linear time , 2016, STOC.

[9]  Ralph R. Martin,et al.  Incremental Eigenanalysis for Classification , 1998, BMVC.

[10]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[11]  Sanjoy Dasgupta,et al.  The Fast Convergence of Incremental PCA , 2013, NIPS.

[12]  Christopher De Sa,et al.  Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems , 2014, ICML.

[13]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[14]  Huy L. Nguyen,et al.  OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings , 2012, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[15]  Sham M. Kakade,et al.  Robust Shift-and-Invert Preconditioning: Faster and More Sample Efficient Algorithms for Eigenvector Computation , 2015, ArXiv.

[16]  Moritz Hardt,et al.  The Noisy Power Method: A Meta Algorithm with Applications , 2013, NIPS.

[17]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[18]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[19]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[20]  Christos Boutsidis,et al.  Online Principal Components Analysis , 2015, SODA.

[21]  David P. Woodruff,et al.  Frequent Directions: Simple and Deterministic Matrix Sketching , 2015, SIAM J. Comput..

[22]  Edo Liberty,et al.  Simple and deterministic matrix sketching , 2012, KDD.

[23]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[24]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[25]  David P. Woodruff,et al.  Optimal Approximate Matrix Product in Terms of Stable Rank , 2015, ICALP.

[26]  Tengyu Ma,et al.  Online Learning of Eigenvectors , 2015, ICML.