Spiked covariances and principal components analysis in high-dimensional random effects models

We study principal components analyses in multivariate random and mixed effects linear models, assuming a spherical-plus-spikes structure for the covariance matrix of each random effect. We characterize the behavior of outlier sample eigenvalues and eigenvectors of MANOVA variance components estimators in such models under a high-dimensional asymptotic regime. Our results show that an aliasing phenomenon may occur in high dimensions, in which eigenvalues and eigenvectors of the MANOVA estimate for one variance component may be influenced by the other components. We propose an alternative procedure for estimating the true principal eigenvalues and eigenvectors that asymptotically corrects for this aliasing problem.

[1]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[2]  A. Robertson Experimental Design in the Evaluation of Genetic Parameters , 1959 .

[3]  David Houle,et al.  Numbering the hairs on our heads: The shared challenge and promise of phenomics , 2010, Proceedings of the National Academy of Sciences.

[4]  J. W. Silverstein Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices , 1995 .

[5]  Terence P. Speed,et al.  CUMULANTS AND PARTITION LATTICES1 , 1983 .

[6]  M. Blows,et al.  Evolutionary Constraints in High-Dimensional Trait Sets , 2014, The American Naturalist.

[7]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[8]  S. J. Arnold,et al.  THE MEASUREMENT OF SELECTION ON CORRELATED CHARACTERS , 1983, Evolution; international journal of organic evolution.

[9]  Lynn Roy LaMotte,et al.  Quadratic Estimation of Variance Components , 1973 .

[10]  J. Klotz,et al.  Maximum Likelihood Estimation of Multivariate Covariance Components for the Balanced One-Way Layout , 1969 .

[11]  C. R. Rao,et al.  Estimation of Variance and Covariance Components in Linear Models , 1972 .

[12]  W. G. Hill,et al.  Modification of Estimates of Parameters in the Construction of Genetic Selection Indices ('Bending') , 1981 .

[13]  K. Meyer,et al.  Estimating variances and covariances for multivariate animal models by restricted maximum likelihood , 1991, Genetics Selection Evolution.

[14]  J. W. Silverstein,et al.  No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices , 1998 .

[15]  Hari Bercovici,et al.  Outliers in the spectrum of large deformed unitarily invariant models , 2012, 1412.4916.

[16]  A. Robertson THE SAMPLING VARIANCE OF THE GENETIC CORRELATION COEFFICIENT , 1959 .

[17]  Bonnie Berger,et al.  Efficient Bayesian mixed model analysis increases association power in large cohorts , 2014 .

[18]  I. Johnstone,et al.  TRACY-WIDOM AT EACH EDGE OF REAL COVARIANCE AND MANOVA ESTIMATORS. , 2017, The annals of applied probability : an official journal of the Institute of Mathematical Statistics.

[19]  J. Mezey,et al.  THE DIMENSIONALITY OF GENETIC VARIATION FOR WING SHAPE IN DROSOPHILA MELANOGASTER , 2005, Evolution; international journal of organic evolution.

[20]  S. Allen,et al.  Mutational Pleiotropy and the Strength of Stabilizing Selection Within and Between Functional Modules of Gene Expression , 2018, Genetics.

[21]  M. Kirkpatrick,et al.  Perils of Parsimony: Properties of Reduced-Rank Estimates of Genetic Covariance Matrices , 2008, Genetics.

[22]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[23]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[24]  Yasuo Amemiya,et al.  What Should be Done When an Estimated between-Group Covariance Matrix is not Nonnegative Definite? , 1985 .

[25]  S. R. Searle,et al.  Minimum Variance Quadratic Unbiased Estimation (MIVQUE) of Variance Components , 1978 .

[26]  Emma Hine,et al.  Determining the Effective Dimensionality of the Genetic Variance–Covariance Matrix , 2006, Genetics.

[27]  B. Walsh,et al.  Abundant Genetic Variation + Strong Selection = Multivariate Genetic Constraints: A Geometric View of Adaptation , 2009 .

[28]  Mark Kirkpatrick,et al.  Better Estimates of Genetic Covariance Matrices by “Bending” Using Penalized Maximum Likelihood , 2010, Genetics.

[29]  Z. Bai,et al.  Central limit theorems for eigenvalues in a spiked population model , 2008, 0806.2503.

[30]  Karl J. Friston,et al.  Variance Components , 2003 .

[31]  Comstock Re,et al.  The components of genetic variance in populations of biparental progenies and their use in estimating the average degree of dominance. , 1948 .

[32]  P. Visscher,et al.  Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model , 2015, PLoS genetics.

[33]  W. G. Hill,et al.  Heritability in the genomics era — concepts and misconceptions , 2008, Nature Reviews Genetics.

[34]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[35]  J. W. Silverstein,et al.  Analysis of the limiting spectral distribution of large dimensional random matrices , 1995 .

[36]  H. Yau,et al.  The local semicircle law for a general class of random matrices , 2012, 1212.0164.

[37]  Sewall Wright,et al.  The analysis of variance and the correlations between relatives with respect to deviations from an optimum , 1935, Journal of Genetics.

[38]  H. Yau,et al.  Spectral statistics of Erdős–Rényi graphs I: Local semicircle law , 2011, 1103.1919.

[39]  S. Omholt,et al.  Phenomics: the next challenge , 2010, Nature Reviews Genetics.

[40]  Jianfeng Yao,et al.  On sample eigenvalues in a generalized spiked population model , 2008, J. Multivar. Anal..

[41]  Mark Kirkpatrick,et al.  Direct Estimation of Genetic Principal Components , 2004, Genetics.

[42]  S. Allen,et al.  The Phenome-Wide Distribution of Genetic Variance , 2015, The American Naturalist.

[43]  H. Yau,et al.  Isotropic local laws for sample covariance and generalized Wigner matrices , 2013, 1308.5729.

[44]  H. Grüneberg,et al.  Introduction to quantitative genetics , 1960 .

[45]  S. R. Searle,et al.  A Note on Estimating Covariance Components , 1974 .

[46]  Yakir A Reshef,et al.  Partitioning heritability by functional annotation using genome-wide association summary statistics , 2015, Nature Genetics.

[47]  H. Yau,et al.  Rigidity of eigenvalues of generalized Wigner matrices , 2010, 1007.4652.

[48]  M. Kirkpatrick,et al.  Restricted maximum likelihood estimation of genetic principal components and smoothed covariance matrices , 2005, Genetics Selection Evolution.

[49]  Jun Yin,et al.  Universality for generalized Wigner matrices with Bernoulli distribution , 2010, 1003.3813.

[50]  R. Lande QUANTITATIVE GENETIC ANALYSIS OF MULTIVARIATE EVOLUTION, APPLIED TO BRAIN:BODY SIZE ALLOMETRY , 1979, Evolution; international journal of organic evolution.

[51]  J. W. Silverstein,et al.  On the empirical distribution of eigenvalues of a class of large dimensional random matrices , 1995 .

[52]  Katrina McGuigan,et al.  The distribution of genetic variance across phenotypic space and the response to selection , 2015, Molecular ecology.

[53]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[54]  Iain M Johnstone,et al.  EIGENVALUE DISTRIBUTIONS OF VARIANCE COMPONENTS ESTIMATORS IN HIGH-DIMENSIONAL RANDOM EFFECTS MODELS. , 2016, Annals of statistics.

[55]  Jun Yin,et al.  Anisotropic local laws for random matrices , 2014, 1410.3516.

[56]  Stephen F. Chenoweth,et al.  The Nature and Extent of Mutational Pleiotropy in Gene Expression of Male Drosophila serrata , 2014, Genetics.

[57]  L. Penrose,et al.  THE CORRELATION BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE , 2022 .

[58]  M. Blows,et al.  Simultaneous Estimation of Additive and Mutational Genetic Variance in an Outbred Population of Drosophila serrata , 2015, Genetics.

[59]  B. Nadler Finite sample approximation results for principal component analysis: a matrix perturbation approach , 2009, 0901.3245.

[60]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[61]  Raj Rao Nadakuditi,et al.  The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices , 2009, 0910.2120.

[62]  M. Blows A tale of two matrices: multivariate approaches in evolutionary biology , 2007, Journal of evolutionary biology.

[63]  H. F. Robinson,et al.  The components of genetic variance in populations of biparental progenies and their use in estimating the average degree of dominance. , 1948, Biometrics.

[64]  Iain M. Johnstone,et al.  Tracy-Widom at each edge of real covariance estimators , 2017 .

[65]  M. Lynch,et al.  Genetics and Analysis of Quantitative Traits , 1996 .

[66]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[67]  A. Guionnet,et al.  Fluctuations of the Extreme Eigenvalues of Finite Rank Deformations of Random Matrices , 2010, 1009.0145.