Streaming Sparse Principal Component Analysis

This paper considers estimating the leading k principal components with at most s non-zero attributes from p-dimensional samples collected sequentially in memory limited environments. We develop and analyze two memory and computational efficient algorithms called streaming sparse PCA and streaming sparse ECA for analyzing data generated according to the spike model and the elliptical model respectively. In particular, the proposed algorithms have memory complexity O(pk), computational complexity O(pk min {k,s log p}) and sample complexity Θ(s log p). We provide their finite sample performance guarantees, which implies statistical consistency in the high dimensional regime. Numerical experiments on synthetic and realworld datasets demonstrate good empirical performance of the proposed algorithms.

[1]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[2]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[3]  Zhaoran Wang,et al.  Nonconvex Statistical Optimization: Minimax-Optimal Sparse PCA in Polynomial Time , 2014, ArXiv.

[4]  Laurent El Ghaoui,et al.  Large-Scale Sparse Principal Component Analysis with Application to Text Data , 2011, NIPS.

[5]  H. Oja,et al.  Sign and Rank Covariance Matrices: Statistical Properties and Application to Principal Components Analysis , 2002 .

[6]  S. Kotz,et al.  Symmetric Multivariate and Related Distributions , 1989 .

[7]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[8]  Jing Lei,et al.  Minimax Rates of Estimation for Sparse PCA in High Dimensions , 2012, AISTATS.

[9]  Nathan Srebro,et al.  Stochastic optimization for PCA and PLS , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10]  B. Nadler,et al.  MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA. , 2012, Annals of statistics.

[11]  Matthew Brand,et al.  Incremental Singular Value Decomposition of Uncertain Data with Missing Values , 2002, ECCV.

[12]  T. Cai,et al.  Optimal estimation and rank detection for sparse spiked covariance matrices , 2013, Probability theory and related fields.

[13]  Zongming Ma Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[14]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[15]  Xiao-Tong Yuan,et al.  Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[16]  Jing Lei,et al.  Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA , 2013, NIPS.

[17]  X.-W. Chang On the perturbation of the Q-factor of the QR factorization , 2012, Numer. Linear Algebra Appl..

[18]  Dan Shen,et al.  Consistency of sparse PCA in High Dimension, Low Sample Size contexts , 2011, J. Multivar. Anal..

[19]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[20]  Han Liu,et al.  ECA: High-Dimensional Elliptical Component Analysis in Non-Gaussian Distributions , 2013, 1310.3561.

[21]  Manfred K. Warmuth,et al.  Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .

[22]  Martin J. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, ISIT.

[23]  Shie Mannor,et al.  Outlier-Robust PCA: The High-Dimensional Case , 2013, IEEE Transactions on Information Theory.

[24]  Ioannis Mitliagkas,et al.  Memory Limited, Streaming PCA , 2013, NIPS.

[25]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[26]  Zhaoran Wang,et al.  Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time , 2014, NIPS.

[27]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[28]  J. Marden Some robust estimates of principal components , 1999 .

[29]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[30]  F. Lindskog,et al.  Multivariate extremes, aggregation and dependence in elliptical distributions , 2002, Advances in Applied Probability.

[31]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[32]  Lester W. Mackey,et al.  Deflation Methods for Sparse PCA , 2008, NIPS.

[33]  Constantine Caramanis,et al.  Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery , 2013, ICML.

[34]  Han Liu,et al.  Optimal Rates of Convergence for Latent Generalized Correlation Matrix Estimation in Transelliptical Distribution , 2013 .

[35]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.