Testing the order of a population spectral distribution for high-dimensional data

Large covariance matrices play a fundamental role in various high-dimensional statistics. Investigating the limiting behavior of the eigenvalues can reveal informative structures of large covariance matrices, which is particularly important in high-dimensional principal component analysis and covariance matrix estimation. In this paper, we propose a framework to test the number of distinct population eigenvalues for large covariance matrices, i.e. the order of a Population Spectral Distribution. The limiting distribution of our test statistic for a Population Spectral Distribution of order 2 is developed along with its ( N , p ) consistency, which is clearly demonstrated in our simulation study. We also apply our test to two classical microarray datasets.

[1]  M. Srivastava Some Tests Concerning the Covariance Matrix in High Dimensional Data , 2005 .

[2]  Noureddine El Karoui Spectrum estimation for large dimensional covariance matrices using random matrix theory , 2006, math/0609418.

[3]  P. Rigollet,et al.  Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[4]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[5]  Jianfeng Yao,et al.  Estimation of the population spectral distribution from a large dimensional sample covariance matrix , 2013, 1302.0355.

[6]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  B. Nadler,et al.  MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA. , 2012, Annals of statistics.

[8]  A. Edelman,et al.  Statistical eigen-inference from large Wishart matrices , 2007, math/0701314.

[9]  Thomas J. Fisher On testing for an identity covariance matrix when the dimensionality equals or exceeds the sample size , 2012 .

[10]  Jianfeng Yao,et al.  A LOCAL MOMENT ESTIMATOR OF THE SPECTRUM OF A LARGE DIMENSIONAL COVARIANCE MATRIX , 2013 .

[11]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[12]  Jianfeng Yao,et al.  ON ESTIMATION OF THE POPULATION SPECTRAL DISTRIBUTION FROM A HIGH‐DIMENSIONAL SAMPLE COVARIANCE MATRIX , 2010 .

[13]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[14]  Christian Berg,et al.  A determinant characterization of moment sequences with finitely many mass points , 2015 .

[15]  Jianfeng Yao,et al.  On generalized expectation-based estimation of a population spectral distribution from high-dimensional data , 2015 .

[16]  Xiaoqian Sun,et al.  A new test for sphericity of the covariance matrix for high dimensional data , 2010, J. Multivar. Anal..

[17]  G. Pan,et al.  Central limit theorem for signal-to-interference ratio of reduced rank linear receiver , 2008, 0806.2768.

[18]  Song-xi Chen,et al.  Tests for High-Dimensional Covariance Matrices , 2010, Random Matrices: Theory and Applications.

[19]  Dietrich von Rosen,et al.  Some tests for the covariance matrix with fewer observations than the dimension under non-normality , 2011, J. Multivar. Anal..