High Dimensional Statistical Inference and Random Matrices

Multivariate statistical analysis is concerned with observations on several variables which are thought to possess some degree of inter-dependence. Driven by problems in genetics and the social sciences, it first flowered in the earlier half of the last century. Subsequently, random matrix theory (RMT) developed, initially within physics, and more recently widely in mathematics. While some of the central objects of study in RMT are identical to those of multivariate statistics, statistical theory was slow to exploit the connection. However, with vast data collection ever more common, data sets now often have as many or more variables than the number of individuals observed. In such contexts, the techniques and results of RMT have much to offer multivariate statistics. The paper reviews some of the progress to date.

[1]  Matthew Harding,et al.  Explaining the single factor bias of arbitrage pricing models in finite samples , 2008 .

[2]  G. Biroli,et al.  On the top eigenvalue of heavy-tailed random matrices , 2006, cond-mat/0609070.

[3]  P. Hall,et al.  Properties of principal component methods for functional and longitudinal data analysis , 2006, math/0608022.

[4]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[5]  Runze Li,et al.  Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[6]  S. Péché,et al.  Universality of local eigenvalue statistics for some sample covariance matrices , 2005 .

[7]  J. Bouchaud,et al.  Financial Applications of Random Matrix Theory: Old Laces and New Pieces , 2005, physics/0507111.

[8]  P. Koev,et al.  The efficient evaluation of the hypergeometric function of a matrix argument , 2005, Math. Comput..

[9]  Ronald W. Butler,et al.  Laplace approximations to hypergeometric functions of two matrix arguments , 2005 .

[10]  A. Soshnikov Poisson Statistics for the Largest Eigenvalues in Random Matrix Ensembles , 2005, math/0504562.

[11]  Noureddine El Karoui,et al.  Tracy-Widom limit for the largest eigenvalue of a large class of complex Wishart matrices , 2005 .

[12]  A. Edelman,et al.  Numerical Methods for Eigenvalue Distributions of Random Matrices , 2005, math-ph/0501068.

[13]  M. Dieng Distribution functions for edge eigenvalues in orthogonal and symplectic ensembles: Painlevé representations , 2004, math/0411421.

[14]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[15]  Noureddine El Karoui,et al.  A rate of convergence result for the largest eigenvalue of complex white Wishart matrices , 2004, math/0409610.

[16]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[17]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[18]  Antonia Maria Tulino,et al.  Random Matrix Theory and Wireless Communications , 2004, Found. Trends Commun. Inf. Theory.

[19]  A. Soshnikov,et al.  On the largest singular values of random matrices with independent Cauchy entries , 2004, math/0403425.

[20]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[21]  M. Rattray,et al.  Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[23]  Noureddine El Karoui,et al.  On the largest eigenvalue of Wishart matrices with identity covariance when n, p and p/n tend to infinity , 2003, math/0309355.

[24]  W. Härdle,et al.  Applied Multivariate Statistical Analysis , 2003 .

[25]  Andrew T. A. Wood,et al.  Laplace approximations for hypergeometric functions with matrix argument , 2002 .

[26]  É. Brézin,et al.  New correlation functions for random matrices and integrals over supergroups , 2002, math-ph/0208001.

[27]  R. Kass,et al.  Shrinkage Estimators for Covariance Matrices , 2001, Biometrics.

[28]  P. Massart,et al.  Gaussian model selection , 2001 .

[29]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[30]  H. Knutsson,et al.  Detection of neural activity in functional MRI using canonical correlation analysis , 2001, Magnetic resonance in medicine.

[31]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[32]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[33]  P. Forrester,et al.  Classical Skew Orthogonal Polynomials and Random Matrices , 1999, solv-int/9907001.

[34]  K. Johansson Shape Fluctuations and Random Matrices , 1999, math/9903134.

[35]  C. Tracy,et al.  Correlation Functions, Cluster Functions, and Spacing Distributions for Random Matrices , 1998, solv-int/9804004.

[36]  Geert Jan Bex,et al.  A Gaussian scenario for unsupervised learning , 1996 .

[37]  C. Tracy,et al.  On orthogonal and symplectic matrix ensembles , 1995, solv-int/9509007.

[38]  R. Cann The history and geography of human genes , 1995, The Journal of Asian Studies.

[39]  J. Berger,et al.  Estimation of a Covariance Matrix Using the Reference Prior , 1994 .

[40]  J. Nadal,et al.  Optimal unsupervised learning , 1994 .

[41]  Michael Biehl,et al.  Statistical mechanics of unsupervised structure recognition , 1994 .

[42]  C. Tracy,et al.  Level-spacing distributions and the Airy kernel , 1992, hep-th/9210074.

[43]  L. Cavalli-Sforza Genes, peoples and languages. , 1991, Scientific American.

[44]  L. R. Haff The Variational Form of Certain Bayes Estimators , 1991 .

[45]  Stephen J. Brown The Number of Factors in Security Returns , 1989 .

[46]  K. I. Gross,et al.  Total positivity, spherical series, and hypergeometric functions of matrix argu ment , 1989 .

[47]  T. Barnett,et al.  Origins and Levels of Monthly and Seasonal Forecast Skill for United States Surface Air Temperatures Determined by Canonical Correlation Analysis , 1987 .

[48]  Steen A. Andersson,et al.  Distribution of Eigenvalues in Multivariate Statistical Analysis , 1983 .

[49]  J. Dauxois,et al.  Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference , 1982 .

[50]  K. Wachter The Limiting Empirical Measure of Multiple Discriminant Ratios , 1980 .

[51]  C. Itzykson,et al.  The planar approximation. II , 1980 .

[52]  P. Menozzi,et al.  Synthetic maps of human gene frequencies in Europeans. , 1978, Science.

[53]  K. Wachter The Strong Limits of Random Matrix Spectra for Sample Matrices of Independent Elements , 1978 .

[54]  E. I. Jury,et al.  On orthogonal polynomials , 1975 .

[55]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[56]  P. B. Kahn,et al.  Higher Order Spacing Distributions for a Class of Unitary Ensembles , 1964 .

[57]  A. James Distributions of Matrix Variates and Latent Roots Derived from Normal Samples , 1964 .

[58]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[59]  Freeman J. Dyson,et al.  The Threefold Way. Algebraic Structure of Symmetry Groups and Ensembles in Quantum Mechanics , 1962 .

[60]  E. Wigner On the Distribution of the Roots of Certain Symmetric Matrices , 1958 .

[61]  E. Wigner Characteristic Vectors of Bordered Matrices with Infinite Dimensions I , 1955 .

[62]  A. Mood On the Distribution of the Characteristic Roots of Normal Second-Moment Matrices , 1951 .

[63]  M. A. Girshick On the Sampling Theory of Roots of Determinantal Equations , 1939 .

[64]  R. Fisher THE SAMPLING DISTRIBUTION OF SOME STATISTICS OBTAINED FROM NON‐LINEAR EQUATIONS , 1939 .

[65]  P. Hsu ON THE DISTRIBUTION OF ROOTS OF CERTAIN DETERMINANTAL EQUATIONS , 1939 .

[66]  J. Wishart THE GENERALISED PRODUCT MOMENT DISTRIBUTION IN SAMPLES FROM A NORMAL MULTIVARIATE POPULATION , 1928 .

[67]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[68]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[69]  Carlos M. Carvalho,et al.  Sparse Statistical Modelling in Gene Expression Genomics , 2006 .

[70]  P. Deift Universality for mathematical and physical systems , 2006 .

[71]  A. Bejan,et al.  LARGEST EIGENVALUES AND SAMPLE COVARIANCE MATRICES. TRACY-WIDOM AND PAINLEVÉ II: COMPUTATIONAL ASPECTS AND REALIZATION IN S-PLUS WITH APPLICATIONS , 2006 .

[72]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[73]  N. O'Connell,et al.  PATTERNS IN EIGENVALUES: THE 70TH JOSIAH WILLARD GIBBS LECTURE , 2003 .

[74]  P. Persson,et al.  Numerical Methods in Random Matrices , 2002 .

[75]  Z. Bai METHODOLOGIES IN SPECTRAL ANALYSIS OF LARGE DIMENSIONAL RANDOM MATRICES , A REVIEW , 1999 .

[76]  D. L. Donoho,et al.  Ideal spacial adaptation via wavelet shrinkage , 1994 .

[77]  G. Box Robustness in the Strategy of Scientific Model Building. , 1979 .

[78]  Clifford S. Stein Estimation of a covariance matrix , 1975 .

[79]  Harish-Chandra Differential Operators on a Semisimple Lie Algebra , 1957 .

[80]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .