Efficient Control of Population Structure in Model Organism Association Mapping

Genomewide association mapping in model organisms such as inbred mouse strains is a promising approach for the identification of risk factors related to human diseases. However, genetic association studies in inbred model organisms are confronted by the problem of complex population structure among strains. This induces inflated false positive rates, which cannot be corrected using standard approaches applied in human association studies such as genomic control or structured association. Recent studies demonstrated that mixed models successfully correct for the genetic relatedness in association mapping in maize and Arabidopsis panel data sets. However, the currently available mixed-model methods suffer from computational inefficiency. In this article, we propose a new method, efficient mixed-model association (EMMA), which corrects for population structure and genetic relatedness in model organism association mapping. Our method takes advantage of the specific nature of the optimization problem in applying mixed models for association mapping, which allows us to substantially increase the computational speed and reliability of the results. We applied EMMA to in silico whole-genome association mapping of inbred mouse strains involving hundreds of thousands of SNPs, in addition to Arabidopsis and maize data sets. We also performed extensive simulation studies to estimate the statistical power of EMMA under various SNP effects, varying degrees of population structure, and differing numbers of multiple measurements per strain. Despite the limited power of inbred mouse association mapping due to the limited number of available inbred strains, we are able to identify significantly associated SNPs, which fall into known QTL or genes identified through previous studies while avoiding an inflation of false positives. An R package implementation and webserver of our EMMA method are publicly available.

[1]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[2]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[3]  H. D. Patterson,et al.  Recovery of inter-block information when block sizes are unequal , 1971 .

[4]  J. McKeon,et al.  F approximations to the distribution of Hotelling's T20 , 1974 .

[5]  D. Harville Bayesian inference for variance components using only error contrasts , 1974 .

[6]  D. Rubin,et al.  Estimation in Covariance Components Models , 1981 .

[7]  C. R. Henderson Applications of linear models in animal breeding , 1984 .

[8]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[9]  Bruce Tier,et al.  A Derivative-Free Approach for Estimating Variance Components in Animal Models by Restricted Maximum Likelihood1 , 1987 .

[10]  K. Meyer,et al.  Restricted maximum likelihood to estimate variance components for animal models with several random effects using a derivative-free algorithm , 1989, Genetics Selection Evolution.

[11]  D. Bates,et al.  Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated-Measures Data , 1988 .

[12]  S. P. Smith Estimation of Genetic Parameters in Non-Linear Models , 1990 .

[13]  M Quinton,et al.  Estimation of effects of single genes on quantitative traits. , 1992, Journal of animal science.

[14]  Robin Thompson,et al.  Restricted Maximum Likelihood Estimation of Variance Components for Univariate Animal Models Using Sparse Matrix Techniques and Average Information , 1995 .

[15]  Robin Thompson,et al.  Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models , 1995 .

[16]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[17]  T. F. Hansen,et al.  Phylogenies and the Comparative Method: A General Approach to Incorporating Phylogenetic Information into the Analysis of Interspecific Data , 1997, The American Naturalist.

[18]  Sue J. Welham,et al.  Likelihood Ratio Tests for Fixed Model Terms using Residual Maximum Likelihood , 1997 .

[19]  L. Almasy,et al.  Multipoint quantitative-trait linkage analysis in general pedigrees. , 1998, American journal of human genetics.

[20]  Dafydd Gibbon,et al.  1 User’s guide , 1998 .

[21]  J. Belknap,et al.  Effect of Within-Strain Sample Size on QTL Detection and Mapping Using Recombinant Inbred Mouse Strains , 1998, Behavior genetics.

[22]  M. Lynch,et al.  Estimation of pairwise relatedness with molecular markers. , 1999, Genetics.

[23]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[24]  W. G. Hill,et al.  Estimating quantitative genetic parameters using sibships reconstructed from marker data. , 2000, Genetics.

[25]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[26]  A. Ishikawa,et al.  Quantitative trait loci for body weight in the intercross between SM/J and A/J mice. , 2001, Experimental animals.

[27]  H. Piepho A quick method for computing approximate thresholds for quantitative trait loci detection. , 2001, Genetics.

[28]  G. Molenberghs,et al.  Linear Mixed Models for Longitudinal Data , 2001 .

[29]  E Matthysen,et al.  A comparison of microsatellite‐based pairwise relatedness estimators , 2001, Molecular ecology.

[30]  J. Cheverud,et al.  Pleiotropy of quantitative trait loci for organ weights and limb bone lengths in mice. , 2002, Physiological Genomics.

[31]  O. Hardy,et al.  spagedi: a versatile computer program to analyse spatial genetic structure at the individual or population levels , 2002 .

[32]  Jinliang Wang,et al.  An estimator for pairwise relatedness using molecular markers. , 2002, Genetics.

[33]  R. Last,et al.  Arabidopsis Map-Based Cloning in the Post-Genome Era , 2002, Plant Physiology.

[34]  Quantitative trait loci (QTL) for lean body mass and body length in MRL/MPJ and SJL/J F(2) mice. , 2002, Functional & integrative genomics.

[35]  S. Mohan,et al.  Quantitative trait loci (QTL) for lean body mass and body length in MRL/MPJ and SJL/J F2 mice , 2002, Functional & Integrative Genomics.

[36]  D. Ruppert,et al.  Likelihood ratio tests in linear mixed models with one variance component , 2003 .

[37]  D. Pomp,et al.  A large-sample QTL study in mice: II. Body composition , 2004, Mammalian Genome.

[38]  Karl J. Friston,et al.  Variance Components , 2003 .

[39]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[40]  G. Beauchamp,et al.  Polymorphisms in the Taste Receptor Gene (Tas1r3) Region Are Associated with Saccharin Preference in 30 Mouse Strains , 2004, The Journal of Neuroscience.

[41]  Serge Batalov,et al.  Use of a Dense Single Nucleotide Polymorphism Map for In Silico Mapping in the Mouse , 2004, PLoS biology.

[42]  Sandrine Dudoit,et al.  Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates , 2004, Statistical applications in genetics and molecular biology.

[43]  X. Gu Statistical Framework for Phylogenomic Analysis of Gene Family Expression Profiles , 2004, Genetics.

[44]  Eran Halperin,et al.  Haplotype reconstruction from genotype data using Imperfect Phylogeny , 2004, Bioinform..

[45]  Andrew I Su,et al.  Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics' , 2005, Nature Genetics.

[46]  Eric E Schadt,et al.  Cis-acting expression quantitative trait loci in mice. , 2005, Genome research.

[47]  Mattias Jakobsson,et al.  The Pattern of Polymorphism in Arabidopsis thaliana , 2005, PLoS biology.

[48]  Keyan Zhao,et al.  Genome-Wide Association Mapping in Arabidopsis Identifies Previously Known Flowering Time and Pathogen Resistance Genes , 2005, PLoS genetics.

[49]  John Doebley,et al.  Maize association population: a high-resolution platform for quantitative trait locus dissection. , 2005, The Plant journal : for cell and molecular biology.

[50]  Todd H. Oakley,et al.  Comparative methods for the analysis of gene-expression evolution: an example using yeast functional genomic data. , 2005, Molecular biology and evolution.

[51]  R. Bernardo,et al.  Power of mixed-model QTL mapping from phenotypic, pedigree and marker data in self-pollinated crops , 2006, Theoretical and Applied Genetics.

[52]  William Valdar,et al.  Strategies for mapping and cloning quantitative trait genes in rodents , 2005, Nature Reviews Genetics.

[53]  H. Kishino,et al.  Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea , 1989, Journal of Molecular Evolution.

[54]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[55]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[56]  Pengyuan Liu,et al.  Candidate lung tumor susceptibility genes identified through whole-genome association analyses in inbred mice , 2006, Nature Genetics.

[57]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[58]  G. Abecasis,et al.  Estimating the power of variance component linkage analysis in large pedigrees , 2006, Genetic epidemiology.

[59]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[60]  Keyan Zhao,et al.  An Arabidopsis Example of Association Mapping in Structured Samples , 2006, PLoS genetics.

[61]  D. Heckerman,et al.  Founder Effects in the Assessment of HIV Polymorphisms and HLA Allele Associations , 2007, Science.

[62]  David Heckerman,et al.  Leveraging Hierarchical Population Structure in Discrete Association Studies , 2007, PloS one.

[63]  Eleazar Eskin,et al.  A sequence-based variation map of 8.27 million SNPs in inbred mouse strains , 2007, Nature.

[64]  Carol J. Bult,et al.  The mouse as a model for human biology: a resource guide for complex trait analysis , 2007, Nature Reviews Genetics.

[65]  Ondrej Libiger,et al.  Generalized Analysis of Molecular Variance , 2007, PLoS genetics.

[66]  Mohammad Fallahi,et al.  An Integrated in Silico Gene Mapping Strategy in Inbred Mice , 2007, Genetics.

[67]  Serge Batalov,et al.  Genomewide Association Analysis in Diverse Inbred Mice: Power and Population Structure , 2007, Genetics.

[68]  F. V. van Eeuwijk,et al.  A Mixed-Model Approach to Association Mapping Using Pedigree Information With an Illustration of Resistance to Phytophthora infestans in Potato , 2007, Genetics.