High-throughput analysis of epistasis in genome-wide association studies with BiForce

Motivation: Gene–gene interactions (epistasis) are thought to be important in shaping complex traits, but they have been under-explored in genome-wide association studies (GWAS) due to the computational challenge of enumerating billions of single nucleotide polymorphism (SNP) combinations. Fast screening tools are needed to make epistasis analysis routinely available in GWAS. Results: We present BiForce to support high-throughput analysis of epistasis in GWAS for either quantitative or binary disease (case–control) traits. BiForce achieves great computational efficiency by using memory efficient data structures, Boolean bitwise operations and multithreaded parallelization. It performs a full pair-wise genome scan to detect interactions involving SNPs with or without significant marginal effects using appropriate Bonferroni-corrected significance thresholds. We show that BiForce is more powerful and significantly faster than published tools for both binary and quantitative traits in a series of performance tests on simulated and real datasets. We demonstrate BiForce in analysing eight metabolic traits in a GWAS cohort (323 697 SNPs, >4500 individuals) and two disease traits in another (>340 000 SNPs, >1750 cases and 1500 controls) on a 32-node computing cluster. BiForce completed analyses of the eight metabolic traits within 1 day, identified nine epistatic pairs of SNPs in five metabolic traits and 18 SNP pairs in two disease traits. BiForce can make the analysis of epistasis a routine exercise in GWAS and thus improve our understanding of the role of epistasis in the genetic regulation of complex traits. Availability and implementation: The software is free and can be downloaded from http://bioinfo.utu.fi/BiForce/. Contact: wenhua.wei@igmm.ed.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  J. Rice,et al.  Two‐Locus models of disease , 1992, Genetic epidemiology.

[2]  Paul Weston,et al.  Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility , 2011, Nature Genetics.

[3]  David Haig,et al.  Does heritability hide in epistasis between linked SNPs? , 2011, European Journal of Human Genetics.

[4]  M. LeBlanc,et al.  Increasing the power of identifying gene × gene interactions in genome‐wide association studies , 2008, Genetic epidemiology.

[5]  E. Lander,et al.  The mystery of missing heritability: Genetic interactions create phantom heritability , 2012, Proceedings of the National Academy of Sciences.

[6]  Chris S. Haley,et al.  Characterisation of Genome-Wide Association Epistasis Signals for Serum Uric Acid in Human Population Isolates , 2011, PloS one.

[7]  C. Hoggart,et al.  Genome-wide association analysis of metabolic traits in a birth cohort from a founder population , 2008, Nature Genetics.

[8]  Jason H. Moore,et al.  Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS , 2010, Bioinform..

[9]  Matti Pirinen,et al.  A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1 , 2010, Nature Genetics.

[10]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[11]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[12]  M. L. Calle,et al.  FAM-MDR: A Flexible Family-Based Multifactor Dimensionality Reduction Technique to Detect Epistasis Using Related Individuals , 2010, PloS one.

[13]  Karsten M. Borgwardt,et al.  EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units , 2011, European Journal of Human Genetics.

[14]  W. Gauderman Sample size requirements for association studies of gene-gene interaction. , 2002, American journal of epidemiology.

[15]  Ioannis Xenarios,et al.  FastEpistasis: a high performance computing solution for quantitative trait epistasis , 2010, Bioinform..

[16]  Yurii S. Aulchenko,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm108 Genetics and population analysis GenABEL: an R library for genome-wide association analysis , 2022 .

[17]  T. Nagylaki,et al.  A model for the genetics of handedness. , 1972, Genetics.

[18]  David M. Evans,et al.  Two-Stage Two-Locus Models in Genome-Wide Association , 2006, PLoS genetics.

[19]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[20]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[21]  Jing Li,et al.  Generating samples for association studies based on HapMap data , 2008, BMC Bioinformatics.

[22]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[23]  R. Jiang,et al.  Epistatic Module Detection for Case-Control Studies: A Bayesian Model with a Gibbs Sampling Strategy , 2009, PLoS genetics.

[24]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[25]  A. Grossman,et al.  Intragenic and Extragenic Suppressors of Temperature Sensitive Mutations in the Replication Initiation Genes dnaD and dnaB of Bacillus subtilis , 2009, PloS one.

[26]  P. Visscher,et al.  Comparing apples and oranges: equating the power of case‐control and quantitative trait association studies , 2009, Genetic epidemiology.

[27]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[28]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[29]  Can Yang,et al.  GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies , 2011, Bioinform..

[30]  C. Haley,et al.  Controlling false positives in the mapping of epistatic QTL , 2010, Heredity.

[31]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[32]  S. Wild,et al.  Genome-wide analysis of epistasis in body mass index using multiple human populations , 2012, European Journal of Human Genetics.

[33]  Chris S Haley,et al.  A combined strategy for quantitative trait loci detection by genome-wide association , 2009, BMC proceedings.

[34]  Alison A. Motsinger-Reif,et al.  Grammatical evolution decision trees for detecting gene-gene interactions , 2010, BioData Mining.

[35]  G. Gibson Hints of hidden heritability in GWAS , 2010, Nature Genetics.

[36]  Yang Liu,et al.  Genome-Wide Interaction-Based Association Analysis Identified Multiple New Susceptibility Loci for Common Diseases , 2011, PLoS genetics.

[37]  Scott M. Williams,et al.  Epistasis and its implications for personal genetics. , 2009, American journal of human genetics.

[38]  C. Haley,et al.  Genomewide Rapid Association Using Mixed Model and Regression: A Fast and Simple Method For Genomewide Pedigree-Based Quantitative Trait Loci Association Analysis , 2007, Genetics.

[39]  Marylyn D. Ritchie,et al.  Data Simulation Software for Whole-Genome Association and Other Studies in Human Genetics , 2005, Pacific Symposium on Biocomputing.

[40]  Ying Wang,et al.  Genomewide association study of leprosy. , 2009, The New England journal of medicine.

[41]  E. Dermitzakis,et al.  Epistatic selection between coding and regulatory variation in human evolution and disease. , 2011, American journal of human genetics.

[42]  Wentian Li,et al.  A Complete Enumeration and Classification of Two-Locus Disease Models , 1999, Human Heredity.

[43]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[44]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[45]  Chris S. Haley,et al.  EpiGPU: exhaustive pairwise epistasis scans parallelized on consumer level graphics cards , 2011, Bioinform..

[46]  J. Hein,et al.  Using biological networks to search for interacting loci in genome-wide association studies , 2009, European Journal of Human Genetics.