Sparse PCA from Sparse Linear Regression

Sparse Principal Component Analysis (SPCA) and Sparse Linear Regression (SLR) have a wide range of applications and have attracted a tremendous amount of attention in the last two decades as canonical examples of statistical problems in high dimension. A variety of algorithms have been proposed for both SPCA and SLR, but an explicit connection between the two had not been made. We show how to efficiently transform a black-box solver for SLR into an algorithm for SPCA: assuming the SLR solver satisfies prediction error guarantees achieved by existing efficient algorithms such as those based on the Lasso, the SPCA algorithm derived from it achieves near state of the art guarantees for testing and for support recovery for the single spiked covariance model as obtained by the current best polynomial-time algorithms. Our reduction not only highlights the inherent similarity between the two problems, but also, from a practical standpoint, allows one to obtain a collection of algorithms for SPCA directly from known algorithms for SLR. We provide experimental results on simulated data comparing our proposed framework to other algorithms for SPCA.

[1]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[2]  David Gamarnik,et al.  Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm , 2017, 1711.04952.

[3]  Oluwasanmi Koyejo,et al.  On Prior Distributions and Approximate Inference for Structured Variables , 2014, NIPS.

[4]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[5]  David Gamarnik,et al.  High Dimensional Regression with Binary Coefficients. Estimating Squared Error and a Phase Transtition , 2017, COLT.

[6]  Shuheng Zhou,et al.  25th Annual Conference on Learning Theory Reconstruction from Anisotropic Random Measurements , 2022 .

[7]  Tong Zhang,et al.  Aggregation of Affine Estimators , 2013, ArXiv.

[8]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[9]  P. Rigollet,et al.  Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[10]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[11]  B. Nadler,et al.  Do Semidefinite Relaxations Really Solve Sparse PCA , 2013 .

[12]  Dustin G. Mixon,et al.  Certifying the Restricted Isometry Property is Hard , 2012, IEEE Transactions on Information Theory.

[13]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[14]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[15]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[16]  Oluwasanmi Koyejo,et al.  Sparse Submodular Probabilistic PCA , 2015, AISTATS.

[17]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[18]  Sundeep Rangan,et al.  Necessary and Sufficient Conditions for Sparsity Pattern Recovery , 2008, IEEE Transactions on Information Theory.

[19]  Thong T. Do,et al.  Sparsity adaptive matching pursuit algorithm for practical compressed sensing , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[20]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[21]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[22]  Alexandre d'Aspremont,et al.  Approximation bounds for sparse principal component analysis , 2012, Math. Program..

[23]  Florentina Bunea,et al.  Sparse Density Estimation with l1 Penalties , 2007, COLT.

[24]  Tengyao Wang,et al.  Sparse principal component analysis via random projections , 2017, 1712.05630.

[25]  Quentin Berthet,et al.  Statistical and computational trade-offs in estimation of sparse principal components , 2014, 1408.5369.

[26]  Zongming Ma Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[27]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[28]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[29]  Xiao-Tong Yuan,et al.  Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[30]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[31]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[32]  Jing Lei,et al.  Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA , 2013, NIPS.

[33]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[34]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[35]  Martin J. Wainwright,et al.  Lower bounds on the performance of polynomial-time algorithms for sparse linear regression , 2014, COLT.

[36]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[37]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[38]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[39]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[40]  Martin J. Wainwright,et al.  Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..

[41]  Andrea Montanari,et al.  Sparse PCA via Covariance Thresholding , 2013, J. Mach. Learn. Res..

[42]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[43]  Martin J. Wainwright,et al.  Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting , 2009, IEEE Trans. Inf. Theory.

[44]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[45]  Michael I. Jordan,et al.  Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators , 2015, 1503.03188.

[46]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[47]  A. Tsybakov,et al.  Exponential Screening and optimal rates of sparse estimation , 2010, 1003.2654.

[48]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.