Evaluating the performance of sparse principal component analysis methods in high-dimensional data scenarios

ABSTRACT High-dimensional datasets have exploded into many fields of research, challenging our interpretation of the classic dimension reduction technique, Principal Component Analysis (PCA). Recently proposed Sparse PCA methods offer useful insight into understanding complex data structures. This article compares three Sparse PCA methods through extensive simulations, with the aim of providing guidelines as to which method to choose under a variety of data structures, as dictated by the variance-covariance matrix. A real gene expression dataset is used to illustrate an application of Sparse PCA in practice and show how to link simulation results with real-world problems.