Better Estimates of Genetic Covariance Matrices by “Bending” Using Penalized Maximum Likelihood

Obtaining accurate estimates of the genetic covariance matrix \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathbf{{\Sigma}}_{\mathrm{G}}\) \end{document} for multivariate data is a fundamental task in quantitative genetics and important for both evolutionary biologists and plant or animal breeders. Classical methods for estimating \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathbf{{\Sigma}}_{\mathrm{G}}\) \end{document} are well known to suffer from substantial sampling errors; importantly, its leading eigenvalues are systematically overestimated. This article proposes a framework that exploits information in the phenotypic covariance matrix \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathbf{{\Sigma}}_{\mathrm{P}}\) \end{document} in a new way to obtain more accurate estimates of \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathbf{{\Sigma}}_{\mathrm{G}}\) \end{document}. The approach focuses on the “canonical heritabilities” (the eigenvalues of \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathbf{{\Sigma}}_{\mathrm{P}}^{{-}1}\mathbf{{\Sigma}}_{\mathrm{G}}\) \end{document}), which may be estimated with more precision than those of \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathbf{{\Sigma}}_{\mathrm{G}}\) \end{document} because \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathbf{{\Sigma}}_{\mathrm{P}}\) \end{document} is estimated more accurately. Our method uses penalized maximum likelihood and shrinkage to reduce bias in estimates of the canonical heritabilities. This in turn can be exploited to get substantial reductions in bias for estimates of the eigenvalues of \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathbf{{\Sigma}}_{\mathrm{G}}\) \end{document} and a reduction in sampling errors for estimates of \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathbf{{\Sigma}}_{\mathrm{G}}\) \end{document}. Simulations show that improvements are greatest when sample sizes are small and the canonical heritabilities are closely spaced. An application to data from beef cattle demonstrates the efficacy this approach and the effect on estimates of heritabilities and correlations. Penalized estimation is recommended for multivariate analyses involving more than a few traits or problems with limited data.

[1]  D. Lawley TESTS OF SIGNIFICANCE FOR THE LATENT ROOTS OF COVARIANCE AND CORRELATION MATRICES , 1956 .

[2]  P. Odell,et al.  A Numerical Procedure to Generate a Sample Covariance Matrix , 1966 .

[3]  J. Klotz,et al.  Maximum Likelihood Estimation of Multivariate Covariance Components for the Balanced One-Way Layout , 1969 .

[4]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[5]  Clifford S. Stein Estimation of a covariance matrix , 1975 .

[6]  R. Thompson The estimation of maternal genetic variances. , 1976, Biometrics.

[7]  D. Harville Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems , 1977 .

[8]  W. G. Hill,et al.  Probabilities of Non-Positive Definite between-Group or Genetic Covariance Matrices , 1978 .

[9]  A. E. Freeman,et al.  Estimates of Direct and Maternal Genetic Correlations for Pupa Weight and Family Size of Tribolium1 , 1978 .

[10]  L. R. Haff Empirical Bayes Estimation of the Multivariate Normal Covariance Matrix , 1980 .

[11]  W. G. Hill,et al.  Modification of Estimates of Parameters in the Construction of Genetic Selection Indices ('Bending') , 1981 .

[12]  Anil K. Bhargava,et al.  Exact probabilities of obtaining estimated non-positive definite between-group covariance matrices , 1982 .

[13]  D. Dey,et al.  Estimation of a covariance matrix under Stein's loss , 1985 .

[14]  Yasuo Amemiya,et al.  What Should be Done When an Estimated between-Group Covariance Matrix is not Nonnegative Definite? , 1985 .

[15]  Ingram Olkin,et al.  Maximum Likelihood Estimators and Likelihood Ratio Criteria in Multivariate Components of Variance , 1986 .

[16]  M. Kirkpatrick,et al.  Analysis of the inheritance, selection and evolution of growth trajectories. , 1990, Genetics.

[17]  Wei-Liem Loh Estimating Covariance Matrices , 1991 .

[18]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[19]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[20]  T. Mathew,et al.  Improved Nonnegative Estimation of Variance Components in Balanced Multivariate Mixed Models , 1994 .

[21]  Douglas M. Bates,et al.  Unconstrained parametrizations for variance-covariance matrices , 1996, Stat. Comput..

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Tatsuya Kubokawa,et al.  Shrinkage and modification techniques in estimation of variance and the related problems : A review , 1998 .

[24]  M. Srivastava,et al.  Improved nonnegative estimation of multivariate components of variance , 1999 .

[25]  M. Pourahmadi Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation , 1999 .

[26]  Kurt Hoffmann,et al.  Stein estimation—A review , 2000 .

[27]  W H Upton,et al.  Genetic analyses of live-animal ultrasound and abattoir carcass traits in Australian Angus and Hereford cattle. , 2000, Journal of animal science.

[28]  R. Kass,et al.  Shrinkage Estimators for Covariance Matrices , 2001, Biometrics.

[29]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[30]  Mark Kirkpatrick,et al.  Direct Estimation of Genetic Principal Components , 2004, Genetics.

[31]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[32]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[33]  B. Ripley,et al.  Semiparametric Regression: Preface , 2003 .

[34]  T. Kubokawa,et al.  Estimation of Covariance Matrices in Fixed and Mixed Effects Linear Models (Subsequently published in "Journal of Multivariate Analysis", 97, 2242-2261, 2006. ) , 2006 .

[35]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[36]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[37]  Jianhua Z. Huang,et al.  Covariance matrix selection and estimation via penalised normal likelihood , 2006 .

[38]  Alessio Sancetta Sample covariance shrinkage for high dimensional dependent data , 2006 .

[39]  S. Geer,et al.  Regularization in statistics , 2006 .

[40]  Korbinian Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology , 2005 .

[41]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[42]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[43]  L. Kruuk,et al.  New Answers for Old Questions: The Evolutionary Quantitative Genetics of Wild Animal Populations , 2008 .

[44]  M. Kirkpatrick Patterns of quantitative genetic variation in multiple dimensions , 2009, Genetica.

[45]  Xiao-Li Meng,et al.  Discussion: One-step sparse estimates in nonconcave penalized likelihood models: Who cares if it is a white cat or a black cat? , 2008, 0808.1016.

[46]  Robin Thompson,et al.  Estimation of quantitative genetic parameters , 2008, Proceedings of the Royal Society B: Biological Sciences.

[47]  Adam J. Rothman,et al.  Sparse estimation of large covariance matrices via a nested Lasso penalty , 2008, 0803.3872.

[48]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[49]  M. Kirkpatrick,et al.  Perils of Parsimony: Properties of Reduced-Rank Estimates of Genetic Covariance Matrices , 2008, Genetics.

[50]  David I. Warton,et al.  Penalized Normal Likelihood and Ridge Regularization of Correlation and Covariance Matrices , 2008 .

[51]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[52]  H. Böhm Shrinkage methods for multivariate spectral analysis , 2008 .

[53]  Discussion: One-step sparse estimates in nonconcave penalized likelihood models , 2008, 0808.1013.

[54]  K. Meyer Factor-analytic models for genotype × environment type problems and structured covariance matrices , 2009, Genetics Selection Evolution.

[55]  Gerhard Tutz,et al.  Penalized regression with correlation-based penalty , 2009, Stat. Comput..

[56]  Ren-Dao Ye,et al.  Improved estimation of the covariance matrix under Stein’s loss , 2009 .

[57]  Scott D. Foster,et al.  ESTIMATION, PREDICTION AND INFERENCE FOR THE LASSO RANDOM EFFECTS MODEL , 2009 .

[58]  John S. Yap,et al.  Nonparametric Modeling of Longitudinal Covariance Structure in Functional Mapping of Quantitative Trait Loci , 2009, Biometrics.

[59]  Adam J. Rothman,et al.  Generalized Thresholding of Large Covariance Matrices , 2009 .

[60]  K. Meyer,et al.  A note on the effects of sampling errors on the accuracy of genetic selection indices , 2010 .