Stability approach to selecting the number of principal components

Principal component analysis (PCA) is a canonical tool that reduces data dimensionality by finding linear transformations that project the data into a lower dimensional subspace while preserving the variability of the data. Selecting the number of principal components (PC) is essential but challenging for PCA since it represents an unsupervised learning problem without a clear target label at the sample level. In this article, we propose a new method to determine the optimal number of PCs based on the stability of the space spanned by PCs. A series of analyses with both synthetic data and real data demonstrates the superior performance of the proposed method.

[1]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[2]  Ph. Besse,et al.  Application of Resampling Methods to the Choice of Dimension in Principal Component Analysis , 1993 .

[3]  Philippe C. Besse PCA stability and choice of dimensionality , 1992 .

[4]  Lexin Li,et al.  Sparse sufficient dimension reduction , 2007 .

[5]  B. Nadler,et al.  Determining the number of components in a factor model from limited noisy data , 2008 .

[6]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[7]  M. Bartlett TESTS OF SIGNIFICANCE IN FACTOR ANALYSIS , 1950 .

[8]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[9]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[10]  W. Krzanowski,et al.  Cross-Validatory Choice of the Number of Components From a Principal Component Analysis , 1982 .

[11]  Gilles Celeux,et al.  Enhancing the selection of a model-based clustering with external categorical variables , 2014, Advances in Data Analysis and Classification.

[12]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[13]  Ker-Chau Li Sliced inverse regression for dimension reduction (with discussion) , 1991 .

[14]  R. Tibshirani,et al.  Selecting the number of principal components: estimation of the true rank of a noisy matrix , 2014, 1410.8260.

[15]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[16]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[17]  L. Ferré Selection of components in principal component analysis: a comparison of methods , 1995 .