GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores

BackgroundGene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits.FindingsHere we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run.ConclusionsGENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.

[1]  Mark I. McCarthy,et al.  Concept, Design and Implementation of a Cardiovascular Gene-Centric 50 K SNP Array for Large-Scale Genomic Association Studies , 2008, PloS one.

[2]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[3]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[4]  Klaus Schulten,et al.  Accelerating Molecular Modeling Applications with GPU Computing , 2009 .

[5]  Hua Zhou,et al.  Graphics Processing Units and High-Dimensional Optimization. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[6]  Jing He,et al.  Gene-based interaction analysis by incorporating external linkage disequilibrium information , 2010, European Journal of Human Genetics.

[7]  Brett A. McKinney,et al.  Real-world comparison of CPU and GPU implementations of SNPrank: a network analysis tool for GWAS , 2011, Bioinform..

[8]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[9]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[10]  Lorenzo Dematté,et al.  GPU computing for systems biology , 2010, Briefings Bioinform..

[11]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[12]  Ioannis Xenarios,et al.  FastEpistasis: a high performance computing solution for quantitative trait epistasis , 2010, Bioinform..

[13]  Uri C. Weiser,et al.  Proceedings of the 37th annual international symposium on Computer architecture , 2010, ISCA 2010.