Rank selection in noist PCA with sure and random matrix theory

Principal component analysis (PCA) is probably the best known method for dimensionality reduction. Perhaps the most important problem in PCA is to determine the number of principal components in a given data set, and in effect separate signal from noise in the data set. Many methods have been proposed to deal with this problem but almost all of them fail in the important practical case when the number of observations is comparable to the number of variables, i.e., the realm of random matrix theory (RMT). In this paper, we propose to use Stein's unbiased risk estimator (SURE) to estimate, with some assistance from RMT, the number of principal components. The method is applied on simulated data and compared to BIC and the Laplace method.

[1]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[2]  Victor Solo A sure-fired way to choose smoothing parameters in ill-conditioned inverse problems , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[3]  Tom Minka,et al.  Automatic Choice of Dimensionality for PCA , 2000, NIPS.

[4]  Stephen M. Smith,et al.  Probabilistic independent component analysis for functional magnetic resonance imaging , 2004, IEEE Transactions on Medical Imaging.

[5]  T. W. Anderson Estimating Linear Statistical Relationships , 1984 .

[6]  Andrzej Cichocki,et al.  Bayesian estimation of the number of principal components , 2006, 2006 14th European Signal Processing Conference.

[7]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[8]  Richard M. Everson,et al.  Inferring the eigenvalues of covariance matrices from limited, noisy data , 2000, IEEE Trans. Signal Process..

[9]  Y. Selen,et al.  Model-order selection: a review of information criterion rules , 2004, IEEE Signal Processing Magazine.

[10]  I. Jolliffe Principal Component Analysis , 2002 .

[11]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[12]  C. Theobald An inequality with application to multivariate analysis , 1975 .

[13]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[14]  Patrick L. Combettes,et al.  Signal detection via spectral theory of large dimensional random matrices , 1992, IEEE Trans. Signal Process..

[15]  H. Malcolm Hudson,et al.  Maximum likelihood restoration and choice of smoothing parameter in deconvolution of image data subject to Poisson noise , 1998 .

[16]  James O. Ramsay Functional Data Analysis , 2005 .

[17]  D. Paul,et al.  Asymptotics of the leading sample eigenvalues for a spiked covariance model , 2004 .

[18]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[19]  Victor Solo,et al.  Smooth Principal Component Analysis with Application to Functional Magnetic Resonance Imaging , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[21]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .