Parallelizing Epistasis Detection in GWAS on FPGA and GPU-Accelerated Computing Systems

High-throughput genotyping technologies (such as SNP-arrays) allow the rapid collection of up to a few million genetic markers of an individual. Detecting epistasis (based on 2-SNP interactions) in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. Computational methods to detect epistasis therefore suffer from prohibitively long runtimes; e.g., processing a moderately-sized dataset consisting of about 500,000 SNPs and 5,000 samples requires several days using state-of-the-art tools on a standard 3 GHz CPU. In this paper, we demonstrate how this task can be accelerated using a combination of fine-grained and coarse-grained parallelism on two different computing systems. The first architecture is based on reconfigurable hardware (FPGAs) while the second architecture uses multiple GPUs connected to the same host. We show that both systems can achieve speedups of around four orders-of-magnitude compared to the sequential implementation. This significantly reduces the runtimes for detecting epistasis to only a few minutes for moderatelysized datasets and to a few hours for large-scale datasets.

[1]  Julian Peto,et al.  A large-scale assessment of two-way SNP interactions in breast cancer susceptibility using 46,450 cases and 42,461 controls from the breast cancer association consortium. , 2014, Human molecular genetics.

[2]  Qiang Yang,et al.  Predictive rule inference for epistatic interaction detection in genome-wide association studies , 2010, Bioinform..

[3]  Guimei Liu,et al.  An empirical comparison of several recent epistatic interaction detection methods , 2011, Bioinform..

[4]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[5]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[6]  Cheng Soon Ong,et al.  GWIS - model-free, fast and exhaustive search for epistatic interactions in case-control GWAS , 2013, BMC Genomics.

[7]  Bertil Schmidt,et al.  A hybrid short read mapping accelerator , 2013, BMC Bioinformatics.

[8]  J. Piriyapongsa,et al.  iLOCi: a SNP interaction prioritization technique for detecting epistasis in genome-wide association studies , 2012, BMC Genomics.

[9]  Jinbo Bi,et al.  Comparing the utility of homogeneous subtypes of cocaine use and related behaviors with DSM‐IV cocaine dependence as traits for genetic association analysis , 2014, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[10]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[11]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[12]  Can Yang,et al.  GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies , 2011, Bioinform..

[13]  Chris S. Haley,et al.  EpiGPU: exhaustive pairwise epistasis scans parallelized on consumer level graphics cards , 2011, Bioinform..

[14]  Ying Liu,et al.  High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP , 2012, Int. J. Reconfigurable Comput..

[15]  K. Roeder,et al.  Screen and clean: a tool for identifying interactions in genome‐wide association studies , 2010, Genetic epidemiology.

[16]  I. Pe’er,et al.  Ultrafast genome-wide scan for SNP–SNP interactions in common complex disease , 2012, Genome research.

[17]  Kristel Van Steen,et al.  Travelling the world of gene-gene interactions , 2012, Briefings Bioinform..

[18]  Kevin Skadron,et al.  Accelerating Compute-Intensive Applications with GPUs and FPGAs , 2008, 2008 Symposium on Application Specific Processors.

[19]  Qiang Yang,et al.  SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies , 2009, Bioinform..

[20]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[21]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[22]  Béla Fehér,et al.  Molecular Docking on FPGA and GPU Platforms , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[23]  Louis Wehenkel,et al.  An efficient algorithm to perform multiple testing in epistasis screening , 2013, BMC Bioinformatics.

[24]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[25]  Xiang Zhang,et al.  TEAM: efficient two-locus epistasis tests in human genome-wide association study , 2010, Bioinform..

[26]  Hongyu Zhao,et al.  The complete compositional epistasis detection in genome-wide association studies , 2013, BMC Genetics.

[27]  Yang Zhao,et al.  A genome-wide gene-gene interaction analysis identifies an epistatic gene pair for lung cancer susceptibility in Han Chinese. , 2014, Carcinogenesis.

[28]  Lin He,et al.  SHEsisEpi, a GPU-enhanced genome-wide SNP-SNP interaction scanning algorithm, efficiently reveals the risk genetic epistasis in bipolar disorder , 2010, Cell Research.

[29]  Tao Jiang,et al.  Detecting genome-wide epistases based on the clustering of relatively frequent items , 2012, Bioinform..

[30]  Cheng Soon Ong,et al.  Stability of Bivariate GWAS Biomarker Detection , 2014, PloS one.

[31]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.