Convex Sparse PCA for Unsupervised Feature Learning

Principal component analysis (PCA) has been widely applied to dimensionality reduction and data pre-processing for different applications in engineering, biology, social science, and the like. Classical PCA and its variants seek for linear projections of the original variables to obtain the low-dimensional feature representations with maximal variance. One limitation is that it is difficult to interpret the results of PCA. Besides, the classical PCA is vulnerable to certain noisy data. In this paper, we propose a Convex Sparse Principal Component Analysis (CSPCA) algorithm and apply it to feature learning. First, we show that PCA can be formulated as a low-rank regression optimization problem. Based on the discussion, the l2, 1-normminimization is incorporated into the objective function to make the regression coefficients sparse, thereby robust to the outliers. Also, based on the sparse model used in CSPCA, an optimal weight is assigned to each of the original feature, which in turn provides the output with good interpretability. With the output of our CSPCA, we can effectively analyze the importance of each feature under the PCA criteria. Our new objective function is convex, and we propose an iterative algorithm to optimize it. We apply the CSPCA algorithm to feature selection and conduct extensive experiments on seven benchmark datasets. Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art unsupervised feature selection algorithms.

[1]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[2]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[3]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[4]  Alan L. Yuille,et al.  Robust principal component analysis by self-organizing rules based on statistical physics approach , 1995, IEEE Trans. Neural Networks.

[5]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[8]  Michael J. Lyons,et al.  Automatic Classification of Single Facial Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Philip S. Yu,et al.  On effective conceptual indexing and similarity search in text data , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[12]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[13]  Marcus Gallagher,et al.  Visualization of learning in multilayer perceptron networks using principal component analysis , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[14]  Wei-Ying Ma,et al.  Mining ratio rules via principal sparse non-negative matrix factorization , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[15]  Shai Avidan,et al.  Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms , 2005, NIPS.

[16]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[17]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[18]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[19]  Haitao Zhao,et al.  A novel incremental principal component analysis and its application for face recognition , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[21]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[22]  Alexandre d'Aspremont,et al.  Full regularization path for sparse principal component analysis , 2007, ICML '07.

[23]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[24]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[25]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[26]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[27]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[28]  Ivor W. Tsang,et al.  Flexible Manifold Embedding: A Framework for Semi-Supervised and Unsupervised Dimension Reduction , 2010, IEEE Transactions on Image Processing.

[29]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[30]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[31]  Yi Yang,et al.  Learning a 3D Human Pose Distance Metric from Geometric Pose Descriptor , 2011, IEEE Transactions on Visualization and Computer Graphics.

[32]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[33]  Hongping Cai,et al.  Learning Linear Discriminant Projections for Dimensionality Reduction of Image Descriptors , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Chao Yuan,et al.  Multi-task Learning for Bayesian Matrix Factorization , 2011, 2011 IEEE 11th International Conference on Data Mining.

[35]  Martha White,et al.  Convex Sparse Coding, Subspace Learning, and Semi-Supervised Extensions , 2011, AAAI.

[36]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[37]  Feiping Nie,et al.  Low-Rank Matrix Recovery via Efficient Schatten p-Norm Minimization , 2012, AAAI.

[38]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.

[39]  Deyu Meng,et al.  Improve robustness of sparse PCA by L1-norm maximization , 2012, Pattern Recognit..

[40]  Ziqiang Shi,et al.  Audio classification with low-rank matrix representation features , 2013, ACM Trans. Intell. Syst. Technol..

[41]  Yi Yang,et al.  A Convex Formulation for Semi-Supervised Multi-Label Feature Selection , 2014, AAAI.

[42]  Huan Liu,et al.  Feature Selection for Social Media Data , 2014, TKDD.

[43]  Lei Zhang,et al.  Robust Principal Component Analysis with Complex Noise , 2014, ICML.

[44]  ChengXiang Zhai,et al.  Unsupervised Feature Selection for Multi-View Clustering on Text-Image Web News Data , 2014, CIKM.

[45]  Feiping Nie,et al.  Optimal Mean Robust Principal Component Analysis , 2014, ICML.

[46]  Jing Liu,et al.  Robust Structured Subspace Learning for Data Representation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Huan Xu,et al.  A Unified Framework for Outlier-Robust PCA-like Algorithms , 2015, ICML.

[48]  Dimitris S. Papailiopoulos,et al.  Sparse PCA via Bipartite Matchings , 2015, NIPS.

[49]  Abhisek Kundu,et al.  Approximating Sparse PCA from Incomplete Data , 2015, NIPS.

[50]  Lina Yao,et al.  Unsupervised Feature Analysis with Class Margin Optimization , 2015, ECML/PKDD.

[51]  Oluwasanmi Koyejo,et al.  Sparse Submodular Probabilistic PCA , 2015, AISTATS.

[52]  Feiping Nie,et al.  Compound Rank- $k$ Projections for Bilinear Analysis , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[53]  Weitong Chen,et al.  Multi-task support vector machines for feature selection with shared knowledge discovery , 2016, Signal Process..