Discovering epistasis in large scale genetic association studies by exploiting graphics cards

Despite the enormous investments made in collecting DNA samples and generating germline variation data across thousands of individuals in modern genome-wide association studies (GWAS), progress has been frustratingly slow in explaining much of the heritability in common disease. Today's paradigm of testing independent hypotheses on each single nucleotide polymorphism (SNP) marker is unlikely to adequately reflect the complex biological processes in disease risk. Alternatively, modeling risk as an ensemble of SNPs that act in concert in a pathway, and/or interact non-additively on log risk for example, may be a more sensible way to approach gene mapping in modern studies. Implementing such analyzes genome-wide can quickly become intractable due to the fact that even modest size SNP panels on modern genotype arrays (500k markers) pose a combinatorial nightmare, require tens of billions of models to be tested for evidence of interaction. In this article, we provide an in-depth analysis of programs that have been developed to explicitly overcome these enormous computational barriers through the use of processors on graphics cards known as Graphics Processing Units (GPU). We include tutorials on GPU technology, which will convey why they are growing in appeal with today's numerical scientists. One obvious advantage is the impressive density of microprocessor cores that are available on only a single GPU. Whereas high end servers feature up to 24 Intel or AMD CPU cores, the latest GPU offerings from nVidia feature over 2600 cores. Each compute node may be outfitted with up to 4 GPU devices. Success on GPUs varies across problems. However, epistasis screens fare well due to the high degree of parallelism exposed in these problems. Papers that we review routinely report GPU speedups of over two orders of magnitude (>100x) over standard CPU implementations.

[1]  Jens Stoye,et al.  Exact and complete short-read alignment to microbial genomes using Graphics Processing Unit programming , 2011, Bioinform..

[2]  Hua Zhou,et al.  Graphics Processing Units and High-Dimensional Optimization. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[3]  Andreas Hildebrandt,et al.  Highly accelerated feature detection in proteomics data sets using modern graphics processing units , 2009, Bioinform..

[4]  Gang Wang,et al.  MrBayes on a Graphics Processing Unit , 2011, Bioinform..

[5]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[6]  S. Wuchty,et al.  eQTL Epistasis – Challenges and Computational Approaches , 2013, Front. Genet..

[7]  Kai Wang,et al.  Mendel-GPU: haplotyping and genotype imputation on graphics processing units , 2012, Bioinform..

[8]  Fan Meng,et al.  The gputools package enables GPU computing in R , 2010, Bioinform..

[9]  Peter Schwabe Graphics Processing Units , 2014, Secure Smart Embedded Devices, Platforms and Applications.

[10]  Russ B. Altman,et al.  CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms , 2011, Bioinform..

[11]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[12]  Taesung Park,et al.  cuGWAM: Genome-wide association multifactor dimensionality reduction using CUDA-enabled high-performance graphics processing unit , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[13]  Karsten M. Borgwardt,et al.  EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units , 2011, European Journal of Human Genetics.

[14]  Erika Cule,et al.  ABC-SysBio—approximate Bayesian computation in Python with GPU support , 2010, Bioinform..

[15]  P. S. Thiagarajan,et al.  Approximate probabilistic analysis of biopathway dynamics , 2012, Bioinform..

[16]  Marc A. Suchard,et al.  Many-core algorithms for statistical phylogenetics , 2009, Bioinform..

[17]  Alison A. Motsinger-Reif,et al.  Multifactor Dimensionality Reduction as a Filter-Based Approach for Genome Wide Association Studies , 2011, Front. Gene..

[18]  Bernd Meyer,et al.  Accelerating reaction-diffusion simulations with general-purpose graphics processing units , 2011, Bioinform..

[19]  Siu-Ming Yiu,et al.  SOAP3: ultra-fast GPU-based parallel alignment tool for short reads , 2012, Bioinform..

[20]  Giorgio Valle,et al.  PASS: a program to align short sequences , 2009, Bioinform..

[21]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[22]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[23]  Jason H. Moore,et al.  Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS , 2010, Bioinform..

[24]  Nathan D. Price,et al.  Graphics processing unit implementations of relative expression analysis algorithms enable dramatic computational speedup , 2011, Bioinform..

[25]  Radek Erban,et al.  STOCHSIMGPU: parallel stochastic simulation for the Systems Biology Toolbox 2 for MATLAB , 2011, Bioinform..

[26]  Michael P. H. Stumpf,et al.  GPU accelerated biochemical network simulation , 2011, Bioinform..

[27]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[28]  Can Yang,et al.  GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies , 2011, Bioinform..

[29]  B. Schölkopf,et al.  GLIDE: GPU-Based Linear Regression for Detection of Epistasis , 2012, Human Heredity.

[30]  Chris S. Haley,et al.  EpiGPU: exhaustive pairwise epistasis scans parallelized on consumer level graphics cards , 2011, Bioinform..

[31]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[32]  Lin He,et al.  SHEsisEpi, a GPU-enhanced genome-wide SNP-SNP interaction scanning algorithm, efficiently reveals the risk genetic epistasis in bipolar disorder , 2010, Cell Research.

[33]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[34]  Gary K. Chen A scalable and portable framework for massively parallel variable selection in genetic association studies , 2012, Bioinform..

[35]  Karsten M. Borgwardt,et al.  Epistasis detection on quantitative phenotypes by exhaustive enumeration using GPUs , 2011, Bioinform..

[36]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..