Convergence of Sample Eigenvalues, Eigenvectors, and Principal Component Scores for Ultra-High Dimensional Data.

The development of high-throughput biomedical technologies has led to increased interest in the analysis of high-dimensional data where the number of features is much larger than the sample size. In this paper, we investigate principal component analysis under the ultra-high dimensional regime, where both the number of features and the sample size increase as the ratio of the two quantities also increases. We bridge the existing results from the finite and the high-dimension low sample size regimes, embedding the two regimes in a more general framework. We also numerically demonstrate the universal application of the results from the finite regime.

[1]  B. Nadler Finite sample approximation results for principal component analysis: a matrix perturbation approach , 2009, 0901.3245.

[2]  J. S. Marron,et al.  Geometric representation of high dimension, low sample size data , 2005 .

[3]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[4]  I. Jolliffe Principal Component Analysis , 2002 .

[5]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[6]  J. S. Marron,et al.  Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA , 2012, J. Multivar. Anal..

[7]  S. John Some optimal multivariate tests , 1971 .

[8]  S. John The distribution of a statistic used for testing sphericity of normal distributions , 1972 .

[9]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[10]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[11]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[12]  F. Wright,et al.  CONVERGENCE AND PREDICTION OF PRINCIPAL COMPONENT SCORES IN HIGH-DIMENSIONAL SETTINGS. , 2012, Annals of statistics.

[13]  J. Marron,et al.  PCA CONSISTENCY IN HIGH DIMENSION, LOW SAMPLE SIZE CONTEXT , 2009, 0911.3827.

[14]  J. Marron,et al.  The high-dimension, low-sample-size geometric representation holds under mild conditions , 2007 .

[15]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .