On the distribution of the largest eigenvalue in principal components analysis

Let x (1) denote the square of the largest singular value of an n x p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x (1) is the largest principal component variance of the covariance matrix X'X, or the largest eigenvalue of a p-variate Wishart distribution on n degrees of freedom with identity covariance. Consider the limit of large p and n with n/p = y ≥ 1. When centered by μ p = (√n-1 + √p) 2 and scaled by σ p = (√n-1 + √p)(1/√n-1 + 1/√p) 1/3 , the distribution of x (1) approaches the Tracy-Widom law of order 1, which is defined in terms of the Painleve II differential equation and can be numerically evaluated and tabulated in software. Simulations show the approximation to be informative for n and p as small as 5. The limit is derived via a corresponding result for complex Wishart matrices using methods from random matrix theory. The result suggests that some aspects of large p multivariate distribution theory may be easier to apply in practice than their fixed p counterparts.

[1]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[2]  E. Wigner Characteristic Vectors of Bordered Matrices with Infinite Dimensions I , 1955 .

[3]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[4]  E. Wigner On the Distribution of the Roots of Certain Symmetric Matrices , 1958 .

[5]  A. Erdélyi Asymptotic Forms for Laguerre Polynomials , 1960 .

[6]  A. Constantine Some Non-Central Distribution Problems in Multivariate Analysis , 1963 .

[7]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[8]  A. James Distributions of Matrix Variates and Latent Roots Derived from Normal Samples , 1964 .

[9]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[10]  M. Kreĭn,et al.  Introduction to the theory of linear nonselfadjoint operators , 1969 .

[11]  F. Dyson Correlations between eigenvalues of a random matrix , 1970 .

[12]  R. Muirhead Powers of the largest latent root test of ∑= I , 1974 .

[13]  F. Olver Asymptotics and Special Functions , 1974 .

[14]  S. P. Hastings,et al.  A boundary value problem associated with the second painlevé transcendent and the Korteweg-de Vries equation , 1980 .

[15]  S. Geman A Limit Theorem for the Norm of Random Matrices , 1980 .

[16]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[17]  T. J. Page Multivariate Statistics: A Vector Space Approach , 1984 .

[18]  M. L. Eaton Multivariate statistics : a vector space approach , 1985 .

[19]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[20]  A. Edelman Eigenvalues and condition numbers of random matrices , 1988 .

[21]  R. Preisendorfer,et al.  Principal Component Analysis in Meteorology and Oceanography , 1988 .

[22]  T. Mark Dunster,et al.  Uniform asymptotic expansions for Whittaker's confluent hypergeometric functions , 1989 .

[23]  Nico M. Temme,et al.  Asymptotic estimates for Laguerre polynomials , 1990 .

[24]  Alan Edelman,et al.  The distribution and moments of the smallest eigenvalue of a random matrix of wishart type , 1991 .

[25]  P. Forrester The spectrum edge of random matrix ensembles , 1993 .

[26]  C. Tracy,et al.  Level-spacing distributions and the Airy kernel , 1992, hep-th/9211141.

[27]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[28]  C. Tracy,et al.  Mathematical Physics © Springer-Verlag 1996 On Orthogonal and Symplectic Matrix Ensembles , 1995 .

[29]  T. W. Anderson R. A. Fisher and multivariate analysis , 1996 .

[30]  C. Tracy,et al.  The Distribution of the Largest Eigenvalue in the Gaussian Ensembles: β = 1, 2, 4 , 1997, solv-int/9707001.

[31]  Distribution Functions for Random Variables for Ensembles of Positive Hermitian Matrices , 1997, math/0107154.

[32]  T. H. Baker,et al.  Random matrix ensembles with an effective extensive external charge , 1998 .

[33]  Craig A. Tracy,et al.  Correlation Functions, Cluster Functions, and Spacing Distributions for Random Matrices , 1998 .

[34]  J. Baik,et al.  On the distribution of the length of the longest increasing subsequence of random permutations , 1998, math/9810105.

[35]  K. Johansson On fluctuations of eigenvalues of random Hermitian matrices , 1998 .

[36]  Harold Widom,et al.  On the Relation Between Orthogonal, Symplectic and Unitary Matrix Ensembles , 1999 .

[37]  Airy Kernel and Painleve II , 1999, solv-int/9901004.

[38]  P. Diaconis,et al.  Longest increasing subsequences: from patience sorting to the Baik-Deift-Johansson theorem , 1999 .

[39]  Z. Bai METHODOLOGIES IN SPECTRAL ANALYSIS OF LARGE DIMENSIONAL RANDOM MATRICES , A REVIEW , 1999 .

[40]  K. Johansson Shape Fluctuations and Random Matrices , 1999, math/9903134.

[41]  A. Soshnikov Universality at the Edge of the Spectrum¶in Wigner Random Matrices , 1999, math-ph/9907013.

[42]  P. Forrester Painlevé transcendent evaluation of the scaled distribution of the smallest eigenvalue in the Laguerre orthogonal and symplectic ensembles , 2000, nlin/0005064.

[43]  P. Deift Orthogonal Polynomials and Random Matrices: A Riemann-Hilbert Approach , 2000 .

[44]  P. Deift Integrable systems and combinatorial theory , 2000 .

[45]  A. Soshnikov A Note on Universality of the Distribution of the Largest Eigenvalues in Certain Sample Covariance Matrices , 2001, math/0104113.