Learning Feature Sparse Principal Subspace

This paper presents new algorithms to solve the feature-sparsity constrained PCA problem (FSPCA), which performs feature selection and PCA simultaneously. Existing optimization methods for FSPCA require data distribution assumptions and lack of global convergence guarantee. Though the general FSPCA problem is NP-hard, we show that, for a low-rank covariance, FSPCA can be solved globally (Algorithm 1). Then, we propose another strategy (Algorithm 2) to solve FSPCA for the general covariance by iteratively building a carefully designed proxy. We prove (data-dependent) approximation bound and convergence guarantees for the new algorithms. For the spectrum of covariance with exponential/Zipf’s distribution, we provide exponential/posynomial approximation bound. Experimental results show the promising performance and efficiency of the new algorithms compared with the state-of-the-arts on both synthetic and real-world datasets.

[1]  Dimitris S. Papailiopoulos,et al.  Sparse PCA through Low-rank Approximations , 2013, ICML.

[2]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[3]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[4]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Exact Top-k Feature Selection via ℓ2,0-Norm Constraint , 2022 .

[5]  Vincent Q. Vu,et al.  MINIMAX SPARSE PRINCIPAL SUBSPACE ESTIMATION IN HIGH DIMENSIONS , 2012, 1211.0373.

[6]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[9]  Xiao-Tong Yuan,et al.  Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[10]  Manuel Blum,et al.  Time Bounds for Selection , 1973, J. Comput. Syst. Sci..

[11]  Alexandre d'Aspremont,et al.  Clustering and feature selection using sparse principal component analysis , 2007, ArXiv.

[12]  Christos H. Papadimitriou,et al.  On the Eigenvalue Power Law , 2002, RANDOM.

[13]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[14]  Oluwasanmi Koyejo,et al.  Sparse Submodular Probabilistic PCA , 2015, AISTATS.

[15]  Zhenhua Guo,et al.  Sparse Principal Component Analysis via Joint L 2, 1-Norm Penalty , 2013, Australasian Conference on Artificial Intelligence.

[16]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[17]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[18]  Paul Tseng,et al.  Partial Proximal Minimization Algorithms for Convex Pprogramming , 1994, SIAM J. Optim..

[19]  K. Lange,et al.  The MM Alternative to EM , 2010, 1104.2203.

[20]  Feiping Nie,et al.  Optimal Mean Robust Principal Component Analysis , 2014, ICML.

[21]  Dimitris S. Papailiopoulos,et al.  Sparse PCA via Bipartite Matchings , 2015, NIPS.

[22]  Shai Avidan,et al.  Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms , 2005, NIPS.

[23]  Zhaoran Wang,et al.  Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time , 2014, NIPS.

[24]  Abhisek Kundu,et al.  Recovering PCA and Sparse PCA via Hybrid-(l1, l2) Sparse Sampling of Data Elements , 2017, J. Mach. Learn. Res..

[25]  Jiawei Han,et al.  Joint Feature Selection and Subspace Learning , 2011, IJCAI.

[26]  Charles Bouveyron,et al.  Bayesian Variable Selection for Globally Sparse Probabilistic PCA , 2016, 1605.05918.

[27]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[28]  Ying Cui,et al.  Convex Principal Feature Selection , 2010, SDM.

[29]  Ajmal S. Mian,et al.  Joint Group Sparse PCA for Compressed Hyperspectral Imaging , 2015, IEEE Transactions on Image Processing.

[30]  Christos Boutsidis,et al.  Optimal Sparse Linear Encoders and Sparse PCA , 2016, NIPS.

[31]  Feiping Nie,et al.  Exploiting Combination Effect for Unsupervised Feature Selection by $\ell_{2,0}$ Norm , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Stephen P. Boyd,et al.  Variations and extension of the convex–concave procedure , 2016 .

[33]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[34]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[35]  Feiping Nie,et al.  Robust Principal Component Analysis with Non-Greedy l1-Norm Maximization , 2011, IJCAI.

[36]  Allen Y. Yang,et al.  Informative feature selection for object recognition via Sparse PCA , 2011, 2011 International Conference on Computer Vision.

[37]  Anru R. Zhang,et al.  Optimal Sparse Singular Value Decomposition for High-Dimensional High-Order Data , 2018, Journal of the American Statistical Association.

[38]  Dimitris S. Papailiopoulos,et al.  Nonnegative Sparse PCA with Provable Guarantees , 2014, ICML.

[39]  Jing Lei,et al.  Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA , 2013, NIPS.

[40]  Huan Xu,et al.  Streaming Sparse Principal Component Analysis , 2015, ICML.

[41]  Aviad Rubinstein,et al.  On the Approximability of Sparse PCA , 2016, COLT.

[42]  Prabhu Babu,et al.  Majorization-Minimization Algorithms in Signal Processing, Communications, and Machine Learning , 2017, IEEE Transactions on Signal Processing.

[43]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[44]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[45]  Dan Yang,et al.  Rate Optimal Denoising of Simultaneously Sparse and Low Rank Matrices , 2014, J. Mach. Learn. Res..

[46]  Lester W. Mackey,et al.  Deflation Methods for Sparse PCA , 2008, NIPS.

[47]  Vincent Q. Vu,et al.  Sparsistency and agnostic inference in sparse PCA , 2014, 1401.6978.