Enrichment of statistical power for genome-wide association studies

BackgroundThe inheritance of most human diseases and agriculturally important traits is controlled by many genes with small effects. Identifying these genes, while simultaneously controlling false positives, is challenging. Among available statistical methods, the mixed linear model (MLM) has been the most flexible and powerful for controlling population structure and individual unequal relatedness (kinship), the two common causes of spurious associations. The introduction of the compressed MLM (CMLM) method provided additional opportunities for optimization by adding two new model parameters: grouping algorithms and number of groups.ResultsThis study introduces another model parameter to develop an enriched CMLM (ECMLM). The parameter involves algorithms to define kinship between groups (that is, kinship algorithms). The ECMLM calculates kinship using several different algorithms and then chooses the best combination between kinship algorithms and grouping algorithms.ConclusionSimulations show that the ECMLM increases statistical power. In some cases, the magnitude of power gained by using ECMLM instead of CMLM is larger than the improvement found by using CMLM instead of MLM.

[1]  Edward S. Buckler,et al.  TASSEL: software for association mapping of complex traits in diverse samples , 2007, Bioinform..

[2]  Edward S. Buckler,et al.  Dwarf8 polymorphisms associate with variation in flowering time , 2001, Nature Genetics.

[3]  BMC Biology , 2004 .

[4]  Bette A. Loiselle,et al.  Spatial genetic structure of a tropical understory shrub, PSYCHOTRIA OFFICINALIS (RuBIACEAE) , 1995 .

[5]  Bjarni J. Vilhjálmsson,et al.  An efficient multi-locus mixed model approach for genome-wide association studies in structured populations , 2012, Nature Genetics.

[6]  Bjarni J. Vilhjálmsson,et al.  Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines , 2010 .

[7]  Brooke L. Fridley,et al.  GWAS meta-analysis and replication identifies three new susceptibility loci for ovarian cancer , 2013, Nature Genetics.

[8]  H. Charles Romesburg,et al.  Cluster analysis for researchers , 1984 .

[9]  Kevin R. Thornton,et al.  Properties and Modeling of GWAS when Complex Disease Risk Is Due to Non-Complementing, Deleterious Mutations in Genes of Large Effect , 2013, PLoS genetics.

[10]  Wei Zou,et al.  Statistical Methods for Mapping Multiple QTL , 2008, International journal of plant genomics.

[11]  Zhiwu Zhang,et al.  Mixed linear model approach adapted for genome-wide association studies , 2010, Nature Genetics.

[12]  Tanya M. Teslovich,et al.  Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index , 2010 .

[13]  Jonathan P. Beauchamp,et al.  GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment , 2013, Science.

[14]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[15]  Peter J. Bradbury,et al.  The Genetic Architecture of Maize Flowering Time , 2009, Science.

[16]  P. Jones,et al.  Canine hip dysplasia is predictable by genotyping. , 2011, Osteoarthritis and cartilage.

[17]  Robert W. Williams,et al.  The nature and identification of quantitative trait loci: a community's view , 2003, Nature Reviews Genetics.

[18]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[19]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[20]  M. Khoury,et al.  Most Published Research Findings Are False—But a Little Replication Goes a Long Way , 2007, PLoS medicine.

[21]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[22]  Zhiwu Zhang,et al.  Estimation of heritabilities, genetic correlations, and breeding values of four traits that collectively define hip dysplasia in dogs. , 2009, American journal of veterinary research.

[23]  Meng Li,et al.  Genetics and population analysis Advance Access publication July 13, 2012 , 2012 .

[24]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[25]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[26]  Qifa Zhang,et al.  Genome-wide association studies of 14 agronomic traits in rice landraces , 2010, Nature Genetics.

[27]  Keyan Zhao,et al.  An Arabidopsis Example of Association Mapping in Structured Samples , 2006, PLoS genetics.

[28]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[29]  J. Pritchard,et al.  Confounding from Cryptic Relatedness in Case-Control Association Studies , 2005, PLoS genetics.

[30]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[31]  G. Acland,et al.  Differential Genetic Regulation of Canine Hip Dysplasia and Osteoarthritis , 2010, PloS one.

[32]  X. Adiconis,et al.  Fenofibrate Effect on Triglyceride and Postprandial Response of Apolipoprotein A5 Variants: The GOLDN Study , 2007, Arteriosclerosis, thrombosis, and vascular biology.

[33]  C. R. Henderson Applications of linear models in animal breeding , 1984 .

[34]  A. Long,et al.  The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. , 1999, Genome research.