PCA consistency for the power spiked model in high-dimensional settings

In this paper, we propose a general spiked model called the power spiked model in high-dimensional settings. We derive relations among the data dimension, the sample size and the high-dimensional noise structure. We first consider asymptotic properties of the conventional estimator of eigenvalues. We show that the estimator is affected by the high-dimensional noise structure directly, so that it becomes inconsistent. In order to overcome such difficulties in a high-dimensional situation, we develop new principal component analysis (PCA) methods called the noise-reduction methodology and the cross-data-matrix methodology under the power spiked model. We show that the new PCA methods can enjoy consistency properties not only for eigenvalues but also for PC directions and PC scores in high-dimensional settings.

[1]  Makoto Aoshima,et al.  PCA Consistency for Non-Gaussian Data in High Dimension, Low Sample Size Context , 2009 .

[2]  Makoto Aoshima,et al.  Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations , 2012, J. Multivar. Anal..

[3]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[4]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[5]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[6]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[7]  Zhidong Bai,et al.  ESTIMATION OF SPIKED EIGENVALUES IN SPIKED MODELS , 2012 .

[8]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[9]  Makoto Aoshima,et al.  Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix , 2010, J. Multivar. Anal..

[10]  J. S. Marron,et al.  Geometric representation of high dimension, low sample size data , 2005 .

[11]  F. Wright,et al.  CONVERGENCE AND PREDICTION OF PRINCIPAL COMPONENT SCORES IN HIGH-DIMENSIONAL SETTINGS. , 2012, Annals of statistics.

[12]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[13]  J. Marron,et al.  The high-dimension, low-sample-size geometric representation holds under mild conditions , 2007 .

[14]  J. S. Marron,et al.  Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA , 2012, J. Multivar. Anal..

[15]  J. Marron,et al.  PCA CONSISTENCY IN HIGH DIMENSION, LOW SAMPLE SIZE CONTEXT , 2009, 0911.3827.

[16]  Makoto Aoshima,et al.  Two-Stage Procedures for High-Dimensional Data , 2011 .

[17]  Song-xi Chen,et al.  A two-sample test for high-dimensional data with applications to gene-set testing , 2010, 1002.4547.

[18]  Z. Bai,et al.  EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM , 1999 .

[19]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[20]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[21]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.