Learning Feature Sparse Principal Components

This paper presents new algorithms to solve the feature-sparsity constrained PCA problem (FSPCA), which performs feature selection and PCA simultaneously. Existing optimization methods for FSPCA require data distribution assumptions and are lack of global convergence guarantee. Though the general FSPCA problem is NP-hard, we show that, for a low-rank covariance, FSPCA can be solved globally (Algorithm 1). Then, we propose another strategy (Algorithm 2) to solve FSPCA for the general covariance by iteratively building a carefully designed proxy. We prove theoretical guarantees on approximation and convergence for the new algorithms. Experimental results show the promising performance of the new algorithms compared with the state-of-the-arts on both synthetic and real-world datasets.

[1]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[2]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[3]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[4]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[7]  Stephen P. Boyd,et al.  Variations and extension of the convex–concave procedure , 2016 .

[8]  Christos H. Papadimitriou,et al.  On the Eigenvalue Power Law , 2002, RANDOM.

[9]  Shai Avidan,et al.  Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms , 2005, NIPS.

[10]  Feiping Nie,et al.  Exploiting Combination Effect for Unsupervised Feature Selection by $\ell_{2,0}$ Norm , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Lester W. Mackey,et al.  Deflation Methods for Sparse PCA , 2008, NIPS.

[12]  Vincent Q. Vu,et al.  Sparsistency and agnostic inference in sparse PCA , 2014, 1401.6978.

[13]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[14]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[15]  Alexandre d'Aspremont,et al.  Clustering and feature selection using sparse principal component analysis , 2007, ArXiv.

[16]  Allen Y. Yang,et al.  Informative feature selection for object recognition via Sparse PCA , 2011, 2011 International Conference on Computer Vision.

[17]  Dimitris S. Papailiopoulos,et al.  Sparse PCA through Low-rank Approximations , 2013, ICML.

[18]  Anru R. Zhang,et al.  Optimal Sparse Singular Value Decomposition for High-Dimensional High-Order Data , 2018, Journal of the American Statistical Association.

[19]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Exact Top-k Feature Selection via ℓ2,0-Norm Constraint , 2022 .

[20]  Vincent Q. Vu,et al.  MINIMAX SPARSE PRINCIPAL SUBSPACE ESTIMATION IN HIGH DIMENSIONS , 2012, 1211.0373.

[21]  Zongming Ma Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[22]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[23]  Jing Lei,et al.  Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA , 2013, NIPS.

[24]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[25]  Xiao-Tong Yuan,et al.  Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[26]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[27]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[28]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[29]  Zhaoran Wang,et al.  Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time , 2014, NIPS.

[30]  Christos Boutsidis,et al.  Optimal Sparse Linear Encoders and Sparse PCA , 2016, NIPS.

[31]  Aviad Rubinstein,et al.  On the Approximability of Sparse PCA , 2016, COLT.

[32]  Prabhu Babu,et al.  Majorization-Minimization Algorithms in Signal Processing, Communications, and Machine Learning , 2017, IEEE Transactions on Signal Processing.

[33]  Huan Xu,et al.  Streaming Sparse Principal Component Analysis , 2015, ICML.

[34]  Shumeet Baluja,et al.  Advances in Neural Information Processing , 1994 .

[35]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[36]  Dan Yang,et al.  Rate Optimal Denoising of Simultaneously Sparse and Low Rank Matrices , 2014, J. Mach. Learn. Res..

[37]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[38]  Dimitris S. Papailiopoulos,et al.  Nonnegative Sparse PCA with Provable Guarantees , 2014, ICML.

[39]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[40]  Feiping Nie,et al.  Efficient Feature Selection via $\ell _{2, 0}$ℓ2, 0-norm Constrained Sparse Regression , 2019, IEEE Trans. Knowl. Data Eng..

[41]  Paul Tseng,et al.  Partial Proximal Minimization Algorithms for Convex Pprogramming , 1994, SIAM J. Optim..

[42]  Abhisek Kundu,et al.  Recovering PCA and Sparse PCA via Hybrid-(l1, l2) Sparse Sampling of Data Elements , 2017, J. Mach. Learn. Res..