Efficient Multiple-Trait Association and Estimation of Genetic Correlation Using the Matrix-Variate Linear Mixed Model

Multiple-trait association mapping, in which multiple traits are used simultaneously in the identification of genetic variants affecting those traits, has recently attracted interest. One class of approaches for this problem builds on classical variance component methodology, utilizing a multitrait version of a linear mixed model. These approaches both increase power and provide insights into the genetic architecture of multiple traits. In particular, it is possible to estimate the genetic correlation, which is a measure of the portion of the total correlation between traits that is due to additive genetic effects. Unfortunately, the practical utility of these methods is limited since they are computationally intractable for large sample sizes. In this article, we introduce a reformulation of the multiple-trait association mapping approach by defining the matrix-variate linear mixed model. Our approach reduces the computational time necessary to perform maximum-likelihood inference in a multiple-trait model by utilizing a data transformation. By utilizing a well-studied human cohort, we show that our approach provides more than a 10-fold speedup, making multiple-trait association feasible in a large population cohort on the genome-wide scale. We take advantage of the efficiency of our approach to analyze gene expression data. By decomposing gene coexpression into a genetic and environmental component, we show that our method provides fundamental insights into the nature of coexpressed genes. An implementation of this method is available at http://genetics.cs.ucla.edu/mvLMM.

[1]  D. Pe’er,et al.  Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification , 2006, Proceedings of the National Academy of Sciences.

[2]  Naomi R. Wray,et al.  Estimating Effects and Making Predictions from Genome-Wide Marker Data , 2010, 1010.4710.

[3]  V. Ducrocq,et al.  Solution of multiple trait animal models with missing data on some traits. , 1993, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[4]  C. Hoggart,et al.  Genome-wide association analysis of metabolic traits in a birth cohort from a founder population , 2008, Nature Genetics.

[5]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[6]  P. Visscher,et al.  Increased accuracy of artificial selection by using the realized relationship matrix. , 2009, Genetics research.

[7]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[8]  H. Grüneberg,et al.  Introduction to quantitative genetics , 1960 .

[9]  Brian R. Cullis,et al.  Multivariate whole genome average interval mapping: QTL analysis for multiple traits and/or environments , 2012, Theoretical and Applied Genetics.

[10]  Yan Guo,et al.  Powerful Bivariate Genome-Wide Association Analyses Suggest the SOX6 Gene Influencing Both Obesity and Osteoporosis Phenotypes in Males , 2009, PloS one.

[11]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[12]  R. Fernando,et al.  Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor , 2013, PLoS genetics.

[13]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[14]  Sang Hong Lee,et al.  Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood , 2012, Bioinform..

[15]  Raphael Mrode,et al.  Linear models for the prediction of animal breeding values , 1996 .

[16]  M. Fornage,et al.  A Phenomics-Based Strategy Identifies Loci on APOC1, BRAP, and PLCG1 Associated with Metabolic Syndrome Phenotype Domains , 2011, PLoS genetics.

[17]  Bjarni J. Vilhjálmsson,et al.  A mixed-model approach for genome-wide association studies of correlated traits in structured populations , 2012, Nature Genetics.

[18]  Robin Thompson,et al.  Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models , 1995 .

[19]  Nora M Bello,et al.  Hierarchical Bayesian modeling of heterogeneous cluster‐ and subject‐level associations between continuous and binary outcomes in dairy production , 2012, Biometrical journal. Biometrische Zeitschrift.

[20]  Manuel A. R. Ferreira,et al.  Genetics and population analysis A multivariate test of association , 2009 .

[21]  Karl J. Friston,et al.  Variance Components , 2003 .

[22]  C. McCulloch,et al.  Generalized Linear Mixed Models , 2005 .

[23]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[24]  S. Horvath,et al.  Evidence for anti-Burkitt tumour globulins in Burkitt tumour patients and healthy individuals. , 1967, British Journal of Cancer.

[25]  R. L. Quaas,et al.  Multiple Trait Evaluation Using Relatives' Records , 1976 .

[26]  John C. Chambers,et al.  A Replication Study of GWAS-Derived Lipid Genes in Asian Indians: The Chromosomal Region 11q23.3 Harbors Loci Contributing to Triglycerides , 2012, PloS one.

[27]  A. Korol,et al.  Enhanced efficiency of quantitative trait loci mapping analysis based on multivariate complexes of quantitative traits. , 2001, Genetics.

[28]  J. Ogutu,et al.  Efficient Computation of Ridge‐Regression Best Linear Unbiased Prediction in Genomic Selection in Plant Breeding , 2012 .

[29]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  José Crossa,et al.  A multi-trait multi-environment QTL mixed model with an application to drought and nitrogen stress trials in maize (Zea mays L.) , 2008, Euphytica.

[31]  R. W. Davis,et al.  Discovery and analysis of inflammatory disease-related genes using cDNA microarrays. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Neil D. Lawrence,et al.  Efficient inference in matrix-variate Gaussian models with \iid observation noise , 2011, NIPS.

[33]  Sue J. Welham,et al.  Likelihood Ratio Tests for Fixed Model Terms using Residual Maximum Likelihood , 1997 .

[34]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[35]  Eleazar Eskin,et al.  Improved linear mixed models for genome-wide association studies , 2012, Nature Methods.

[36]  L. Kruglyak,et al.  Gene–Environment Interaction in Yeast Gene Expression , 2008, PLoS biology.

[37]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[38]  Robin Thompson,et al.  Estimation in a multiplicative mixed model involving a genetic relationship matrix , 2009, Genetics Selection Evolution.