MatrixEpistasis: ultrafast, exhaustive epistasis scan for quantitative traits with covariate adjustment

Motivation: For many traits, causal loci uncovered by genetic mapping studies explain only a minority of the heritable contribution to trait variation. Multiple explanations for this ‘missing heritability’ have been proposed. Single nucleotide polymorphism (SNP)‐SNP interaction (epistasis), as one of the compelling models, has been widely studied. However, the genome‐wide scan of epistasis, especially for quantitative traits, poses huge computational challenges. Moreover, covariate adjustment is largely ignored in epistasis analysis due to the massive extra computational undertaking. Results: In the current study, we found striking differences among epistasis models using both simulation data and real biological data, suggesting that not only can covariate adjustment remove confounding bias, it can also improve power. Furthermore, we derived mathematical formulas, which enable the exhaustive epistasis scan together with full covariate adjustment to be expressed in terms of large matrix operation, therefore substantially improving the computational efficiency (˜104× faster than existing methods). We call the new method MatrixEpistasis. With MatrixEpistasis, we re‐analyze a large real yeast dataset comprising 11 623 SNPs, 1008 segregants and 46 quantitative traits with covariates fully adjusted and detect thousands of novel putative epistasis with P‐values < 1.48e‐10. Availability and implementation: The method is implemented in R and available at https://github.com/fanglab/MatrixEpistasis. Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  J. Stevens Applied Multivariate Statistics for the Social Sciences , 1986 .

[2]  D. Rubinfeld,et al.  Econometric models and economic forecasts , 2002 .

[3]  Chris S. Haley,et al.  EpiGPU: exhaustive pairwise epistasis scans parallelized on consumer level graphics cards , 2011, Bioinform..

[4]  L. Kruglyak,et al.  Finding the sources of missing heritability in a yeast cross , 2012, Nature.

[5]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[6]  Xiang Zhang,et al.  TEAM: efficient two-locus epistasis tests in human genome-wide association study , 2010, Bioinform..

[7]  J. N. R. Jeffers,et al.  Graphical Models in Applied Multivariate Statistics. , 1990 .

[8]  Dana C. Crawford,et al.  Unravelling the human genome–phenome relationship using phenome-wide association studies , 2016, Nature Reviews Genetics.

[9]  Chris S. Haley,et al.  Detecting epistasis in human complex traits , 2014, Nature Reviews Genetics.

[10]  John D. Storey,et al.  Genetic interactions between polymorphisms that affect gene expression in yeast , 2005, Nature.

[11]  David M. Evans,et al.  Two-Stage Two-Locus Models in Genome-Wide Association , 2006, PLoS genetics.

[12]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[13]  Andrew P Morris,et al.  Rapid Testing of Gene-Gene Interactions in Genome-Wide Association Studies of Binary and Quantitative Phenotypes , 2011, Genetic epidemiology.

[14]  Wei Lu,et al.  CAPE: An R Package for Combined Analysis of Pleiotropy and Epistasis , 2013, PLoS Comput. Biol..

[15]  John D. Storey,et al.  Multiple Locus Linkage Analysis of Genomewide Expression in Yeast , 2005, PLoS biology.

[16]  T. Mackay The genetic architecture of quantitative traits. , 2001, Annual review of genetics.

[17]  Angeline S. Andrew,et al.  A Simple and Computationally Efficient Sampling Approach to Covariate Adjustment for Multifactor Dimensionality Reduction Analysis of Epistasis , 2010, Human Heredity.

[18]  E. Lander,et al.  The mystery of missing heritability: Genetic interactions create phantom heritability , 2012, Proceedings of the National Academy of Sciences.

[19]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[20]  Eran Halperin,et al.  EPIQ—efficient detection of SNP–SNP epistatic interactions for quantitative traits , 2014, Bioinform..

[21]  Rachel B. Brem,et al.  The landscape of genetic complexity across 5,700 gene expression traits in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  J. Hein,et al.  Using biological networks to search for interacting loci in genome-wide association studies , 2009, European Journal of Human Genetics.

[23]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[24]  R. Elston,et al.  Two-marker association tests yield new disease associations for coronary artery disease and hypertension , 2011, Human Genetics.

[25]  Karsten M. Borgwardt,et al.  EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units , 2011, European Journal of Human Genetics.

[26]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[27]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[28]  Mario Cortina-Borja,et al.  Epistasis in sporadic Alzheimer's disease , 2009, Neurobiology of Aging.

[29]  I. Pe’er,et al.  Ultrafast genome-wide scan for SNP–SNP interactions in common complex disease , 2012, Genome research.

[30]  Vineet Bafna,et al.  RAPID detection of gene-gene interactions in genome-wide association studies , 2010, Bioinform..

[31]  Lin He,et al.  SHEsisEpi, a GPU-enhanced genome-wide SNP-SNP interaction scanning algorithm, efficiently reveals the risk genetic epistasis in bipolar disorder , 2010, Cell Research.

[32]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[33]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.