论文信息 - Evaluating the performance of sparse principal component analysis methods in high-dimensional data scenarios

Evaluating the performance of sparse principal component analysis methods in high-dimensional data scenarios

ABSTRACT High-dimensional datasets have exploded into many fields of research, challenging our interpretation of the classic dimension reduction technique, Principal Component Analysis (PCA). Recently proposed Sparse PCA methods offer useful insight into understanding complex data structures. This article compares three Sparse PCA methods through extensive simulations, with the aim of providing guidelines as to which method to choose under a variety of data structures, as dictated by the variance-covariance matrix. A real gene expression dataset is used to illustrate an application of Sparse PCA in practice and show how to link simulation results with real-world problems.

Joseph Beyene | Ashley J. Bonner | J. Beyene | A. Bonner

[1] I. Johnstone,et al. On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[2] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[3] R. Tibshirani,et al. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[4] A. Höskuldsson. PLS regression methods , 1988 .

[5] Arthur E. Hoerl,et al. Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[6] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.

[7] Richard A. Johnson,et al. Applied Multivariate Statistical Analysis , 1983 .

[8] C. Eckart,et al. The approximation of one matrix by another of lower rank , 1936 .

[9] R. Tibshirani,et al. Sparse Principal Component Analysis , 2006 .

[10] Jianhua Z. Huang,et al. Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[11] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[12] Woojoo Lee,et al. Super-sparse principal component analyses for high-throughput genomic data , 2010, BMC Bioinformatics.