Online Principal Components Analysis

We consider the online version of the well known Principal Component Analysis (PCA) problem. In standard PCA, the input to the problem is a set of d-dimensional vectors X = [x1, . . ., xn] and a target dimension k < d; the output is a set of k-dimensional vectors Y = [y1, . . ., yn] that minimize the reconstruction error: [EQUATION]. Here, Φ ∈ Rdxk is restricted to being isometric. The global minimum of this quantity, OPTk, is obtainable by offline PCA. In online PCA (OPCA) the setting is identical except for two differences: i) the vectors xt are presented to the algorithm one by one and for every presented xt the algorithm must output a vector yt before receiving xt+1; ii) the output vectors yt are l dimensional with l ≥ k to compensate for the handicap of operating online. To the best of our knowledge, this paper is the first to consider this setting of OPCA. Our algorithm produces yt ∈ Rl with l = O(k · poly(1/e)) such that ALG ≤ OPTk + e||X||[EQUATION].

[1]  Edward L. Braun,et al.  6 – Arithmetic Operations , 1963 .

[2]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[3]  G. Dunteman Principal Components Analysis , 1989 .

[4]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[5]  Anupam Gupta,et al.  An elementary proof of the Johnson-Lindenstrauss Lemma , 1999 .

[6]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[7]  Petros Drineas,et al.  Pass efficient algorithms for approximating large matrices , 2003, SODA '03.

[8]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[9]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[10]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[11]  Per-Gunnar Martinsson,et al.  Randomized algorithms for the low-rank approximation of matrices , 2007, Proceedings of the National Academy of Sciences.

[12]  Manfred K. Warmuth,et al.  Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .

[13]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[14]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[15]  Jiazhong Nie,et al.  Online PCA with Optimal Regrets , 2013, ALT.

[16]  Edo Liberty,et al.  Simple and deterministic matrix sketching , 2012, KDD.

[17]  Sanjoy Dasgupta,et al.  The Fast Convergence of Incremental PCA , 2013, NIPS.

[18]  Ioannis Mitliagkas,et al.  Memory Limited, Streaming PCA , 2013, NIPS.

[19]  Nathan Srebro,et al.  Stochastic Optimization of PCA with Capped MSG , 2013, NIPS.

[20]  Ioannis Mitliagkas,et al.  Streaming, Memory-limited PCA , 2013 .

[21]  Jeff M. Phillips,et al.  Relative Errors for Deterministic Low-Rank Matrix Approximations , 2013, SODA.

[22]  L. Bottou,et al.  27th Annual Conference on Neural Information Processing Systems 2013: December 5-10, Lake Tahoe, Nevada, USA , 2014, NIPS 2014.