Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies

Motivated by genome-wide association studies, we consider a standard linear model with one additional random effect in situations where many predictors have been collected on the same subjects and each predictor is analyzed separately. Three novel contributions are (1) a transformation between the linear and log-odds scales which is accurate for the important genetic case of small effect sizes; (2) a likelihood-maximization algorithm that is an order of magnitude faster than the previously published approaches; and (3) efficient methods for computing marginal likelihoods which allow Bayesian model comparison. The methodology has been successfully applied to a large-scale association study of multiple sclerosis including over 20,000 individuals and 500,000 genetic variants.

[1]  Jon Wakefield,et al.  Bayes factors for genome‐wide association studies: comparison with P‐values , 2009, Genetic epidemiology.

[2]  Yurii S. Aulchenko,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm108 Genetics and population analysis GenABEL: an R library for genome-wide association analysis , 2022 .

[3]  Bjarni J. Vilhjálmsson,et al.  Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines , 2010 .

[4]  M. Stephens,et al.  Bayesian statistical methods for genetic association studies , 2009, Nature Reviews Genetics.

[5]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[6]  Edward S. Buckler,et al.  Software engineering the mixed model for genome-wide association studies on large samples , 2009, Briefings Bioinform..

[7]  L. Wasserman,et al.  Genomic control, a new approach to genetic-based association studies. , 2001, Theoretical population biology.

[8]  Andreas Karlsson,et al.  Matrix Analysis for Statistics (2nd ed.), by James R. Schott , 2007 .

[9]  Ying Wang,et al.  Genomewide association study of leprosy. , 2009, The New England journal of medicine.

[10]  Zhiwu Zhang,et al.  Mixed linear model approach adapted for genome-wide association studies , 2010, Nature Genetics.

[11]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[12]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[13]  Ton J. Cleophas,et al.  Mixed Linear Models , 2013 .

[14]  Jeremiah D. Degenhardt,et al.  A Simple Genetic Architecture Underlies Morphological Variation in Dogs , 2010, PLoS biology.

[15]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[16]  C. Haley,et al.  Genomewide Rapid Association Using Mixed Model and Regression: A Fast and Simple Method For Genomewide Pedigree-Based Quantitative Trait Loci Association Analysis , 2007, Genetics.

[17]  S. Leal Genetics and Analysis of Quantitative Traits , 2001 .

[18]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[19]  Mark I McCarthy,et al.  Genomic inflation factors under polygenic inheritance , 2011, European Journal of Human Genetics.

[20]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[21]  David Heckerman,et al.  Correction for hidden confounders in the genetic analysis of gene expression , 2010, Proceedings of the National Academy of Sciences.

[22]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[23]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[24]  Edward S. Buckler,et al.  TASSEL: software for association mapping of complex traits in diverse samples , 2007, Bioinform..

[25]  Simon C. Potter,et al.  Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis , 2011, Nature.

[26]  William J. Astle,et al.  Population Structure and Cryptic Relatedness in Genetic Association Studies , 2009, 1010.4681.

[27]  James R. Schott,et al.  Matrix Analysis for Statistics , 2005 .

[28]  L. Penrose,et al.  THE CORRELATION BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE , 2022 .

[29]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[30]  P. Armitage Tests for Linear Trends in Proportions and Frequencies , 1955 .

[31]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.