Feature Grouping and Sparse Principal Component Analysis

Sparse Principal Component Analysis (SPCA) is widely used in data processing and dimension reduction; it uses the lasso to produce modified principal components with sparse loadings for better interpretability. However, sparse PCA never considers an additional grouping structure where the loadings share similar coefficients (i.e., feature grouping), besides a special group with all coefficients being zero (i.e., feature selection). In this paper, we propose a novel method called Feature Grouping and Sparse Principal Component Analysis (FGSPCA) which allows the loadings to belong to disjoint homogeneous groups, with sparsity as a special case. The proposed FGSPCA is a subspace learning method designed to simultaneously perform grouping pursuit and feature selection, by imposing a non-convex regularization with naturally adjustable sparsity and grouping effect. To solve the resulting non-convex optimization problem, we propose an alternating algorithm that incorporates the difference-of-convex programming, augmented Lagrange and coordinate descent methods. Additionally, the experimental results on real data sets show that the proposed FGSPCA benefits from the grouping effect compared with methods without grouping effect.

[1]  Chunlin Wu,et al.  A general truncated regularization framework for contrast-preserving variational signal and image restoration: Motivation and implementation , 2016, Science China Mathematics.

[2]  Wei Pan,et al.  On constrained and regularized high-dimensional regression , 2013, Annals of the Institute of Statistical Mathematics.

[3]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[4]  Zhenyu He,et al.  Joint sparse principal component analysis , 2017, Pattern Recognit..

[5]  Hanghang Tong,et al.  Robust Principal Component Analysis with Adaptive Neighbors , 2019, NeurIPS.

[6]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[7]  Jing Lei,et al.  Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA , 2013, NIPS.

[8]  Jieping Ye,et al.  Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets , 2014, NIPS.

[9]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[10]  Peter Filzmoser,et al.  Robust Sparse Principal Component Analysis , 2013, Technometrics.

[11]  B. Nadler Finite sample approximation results for principal component analysis: a matrix perturbation approach , 2009, 0901.3245.

[12]  Eunho Yang,et al.  Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning , 2019, ICML.

[13]  Francis R. Bach,et al.  Structured Sparse Principal Component Analysis , 2009, AISTATS.

[14]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[15]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[16]  A. Rinaldo Properties and refinements of the fused lasso , 2008, 0805.0234.

[17]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[18]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[19]  Wei Pan,et al.  Simultaneous supervised clustering and feature selection over a graph. , 2012, Biometrika.

[20]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[21]  Jieping Ye,et al.  Feature grouping and selection over an undirected graph , 2012, KDD.

[22]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[23]  Xuelong Li,et al.  Learning Feature Sparse Principal Subspace , 2020, NeurIPS.

[24]  Holger Hoefling A Path Algorithm for the Fused Lasso Signal Approximator , 2009, 0910.0526.

[25]  I. Jolliffe Principal Components in Regression Analysis , 1986 .

[26]  Jieping Ye,et al.  Efficient Methods for Overlapping Group Lasso , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  J. M. Martínez,et al.  On constrained optimization with nonconvex regularization , 2020, Numerical Algorithms.

[28]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[29]  Steven L. Brunton,et al.  Sparse Principal Component Analysis via Variable Projection , 2018, SIAM J. Appl. Math..

[30]  Slobodan Vucetic,et al.  Sparse Principal Component Analysis with Constraints , 2012, AAAI.

[31]  F. Liu,et al.  High-dimensional sign-constrained feature selection and grouping , 2020, Annals of the Institute of Statistical Mathematics.

[32]  Ajmal S. Mian,et al.  Joint Group Sparse PCA for Compressed Hyperspectral Imaging , 2015, IEEE Transactions on Image Processing.

[33]  Aaron Sidford,et al.  Principal Component Projection and Regression in Nearly Linear Time through Asymmetric SVRG , 2019, NeurIPS.

[34]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[35]  Peilin Liu,et al.  A Survey on Nonconvex Regularization-Based Sparse and Low-Rank Recovery in Signal Processing, Statistics, and Machine Learning , 2018, IEEE Access.

[36]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[37]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[38]  Lingzhou Xue,et al.  A Selective Overview of Sparse Principal Component Analysis , 2018, Proceedings of the IEEE.

[39]  Kim-Chuan Toh,et al.  An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems , 2017, Mathematical Programming.

[40]  Zhichang Guo,et al.  An adaptive total variational despeckling model based on gray level indicator frame , 2020, Inverse Problems & Imaging.

[41]  Zhaoran Wang,et al.  Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time , 2014, NIPS.