A fast multi-locus random-SNP-effect EMMA for genome-wide association studies

Although the mixed linear model (MLM) such as efficient mixed model association (EMMA), has been widely used in genome-wide association studies (GWAS), relatively little is known about fast and efficient algorithms to implement multi-locus GWAS. To address this issue, we report a fast multi-locus random-SNP-effect EMMA (FASTmrEMMA). In this method, a new matrix transformation was constructed to obtain a new genetic model that includes only quantitative trait nucleotide (QTN) variation and normal residual error; letting the number of nonzero eigenvalues be one and fixing the polygenic-to-residual variance ratio was used to increase computing speed. All the putative QTNs with the ≤0.005 P-values in the first step of the new method were included in one multi-locus model for true QTN detection. Owing to the multi-locus feature, the Bonferroni correction is replaced by a less stringent selection criterion. Results from analyses of both simulated and real data showed that FASTmrEMMA is more powerful in QTN detection, model fit and robustness, has less bias in QTN effect estimation, and requires less running time than the current single- and multi-locus methodologies for GWAS, such as E-BAYES, SUPER, EMMA, CMLM and ECMLM. Therefore, FASTmrEMMA provides an alternative for multi-locus GWAS.

[1]  N. Yi,et al.  Bayesian LASSO for Quantitative Trait Loci Mapping , 2008, Genetics.

[2]  Andreas Karlsson,et al.  Matrix Analysis for Statistics , 2007, Technometrics.

[3]  Shizhong Xu,et al.  Mapping Quantitative Trait Loci Using Naturally Occurring Genetic Variance Among Commercial Inbred Lines of Maize (Zea mays L.) , 2005, Genetics.

[4]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[5]  Yurii S. Aulchenko,et al.  Edinburgh Research Explorer Development and Application of Genomic Control Methods for Genome-wide Association Studies Using Non-additive Models Development and Application of Genomic Control Methods for Genome-wide Association Studies Using Non-additive Models , 2022 .

[6]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.

[7]  Muhammad Ali Amer,et al.  Genome-wide association study of 107 phenotypes in a common set of Arabidopsis thaliana inbred lines , 2010, Nature.

[8]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[9]  Bonnie Berger,et al.  Efficient Bayesian mixed model analysis increases association power in large cohorts , 2014 .

[10]  Shizhong Xu,et al.  An expectation–maximization algorithm for the Lasso estimation of quantitative trait locus effects , 2010, Heredity.

[11]  Bo Huang,et al.  Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology , 2016, Scientific Reports.

[12]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[13]  Eugene Demidenko,et al.  Mixed Models: Theory and Applications with R , 2013 .

[14]  H. Cordell,et al.  SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression , 2010, Genetic epidemiology.

[15]  K. Eskridge,et al.  Identifying QTLs and Epistasis in Structured Plant Populations Using Adaptive Mixed LASSO , 2011 .

[16]  Bjarni J. Vilhjálmsson,et al.  An efficient multi-locus mixed model approach for genome-wide association studies in structured populations , 2012, Nature Genetics.

[17]  Edward S. Buckler,et al.  TASSEL: software for association mapping of complex traits in diverse samples , 2007, Bioinform..

[18]  Stefan R. Henz,et al.  A gene expression map of Arabidopsis thaliana development , 2005, Nature Genetics.

[19]  Yongfeng Guo,et al.  The EPIP Peptide of INFLORESCENCE DEFICIENT IN ABSCISSION Is Sufficient to Induce Abscission in Arabidopsis through the Receptor-Like Kinases HAESA and HAESA-LIKE2[W][OA] , 2008, The Plant Cell Online.

[20]  C. Hoggart,et al.  Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies , 2008, PLoS genetics.

[21]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[22]  S. R. Searle,et al.  A Notebook on Variance Components: A Detailed Description of Recent Methods of Estimating Variance Components, with Applications in Animal Breeding , 1978 .

[23]  K. Vandepoele,et al.  Systematic Identification of Functional Plant Modules through the Integration of Complementary Data Sources1[W][OA] , 2012, Plant Physiology.

[24]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[25]  Joe H. Ward,et al.  Introduction to linear models , 1974 .

[26]  Shizhong Xu,et al.  A Random-Model Approach to QTL Mapping in Multiparent Advanced Generation Intercross (MAGIC) Populations , 2015, Genetics.

[27]  Shizhong Xu,et al.  An Empirical Bayes Method for Estimating Epistatic Effects of Quantitative Trait Loci , 2007, Biometrics.

[28]  Murray Logan Introduction to Linear Models , 2010 .

[29]  Tatiana I Axenovich,et al.  Rapid variance components–based method for whole-genome association analysis , 2012, Nature Genetics.

[30]  Eleazar Eskin,et al.  Improved linear mixed models for genome-wide association studies , 2012, Nature Methods.

[31]  P. Visscher,et al.  Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model , 2015, PLoS genetics.

[32]  Edward S. Buckler,et al.  A SUPER Powerful Method for Genome Wide Association Study , 2014, PloS one.

[33]  Shizhong Xu Mapping Quantitative Trait Loci by Controlling Polygenic Background Effects , 2013, Genetics.

[34]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[35]  Jim M. Dunwell,et al.  Genetic dissection of heterosis using epistatic association mapping in a partial NCII mating design , 2015, Scientific Reports.

[36]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Taesung Park,et al.  Joint Identification of Multiple Genetic Variants via Elastic‐Net Variable Selection in a Genome‐Wide Association Analysis , 2010, Annals of human genetics.

[38]  Robin Thompson,et al.  Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models , 1995 .

[39]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[40]  Yuan-Ming Zhang,et al.  Epistatic Association Mapping in Homozygous Crop Cultivars , 2011, PloS one.

[41]  Zhiwu Zhang,et al.  Enrichment of statistical power for genome-wide association studies , 2014, BMC Biology.

[42]  Yang-Jun Wen,et al.  Mapping small-effect and linked quantitative trait loci for complex traits in backcross or DH populations via a multi-locus GWAS methodology , 2016, Scientific Reports.

[43]  Zhiwu Zhang,et al.  Mixed linear model approach adapted for genome-wide association studies , 2010, Nature Genetics.

[44]  Naomi R. Wray,et al.  Estimating Effects and Making Predictions from Genome-Wide Marker Data , 2010, 1010.4710.

[45]  Keyan Zhao,et al.  An Arabidopsis Example of Association Mapping in Structured Samples , 2006, PLoS genetics.

[46]  Gengxin Li,et al.  Genetic Studies: The Linear Mixed Models in Genome-wide Association Studies , 2013 .

[47]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.