Accelerating Large-Scale Genome-Wide Association Studies With Graphics Processors

Large-scale Genome-Wide Association Studies (GWAS) are a Big Data application due to the great amount of data to process and high computation intensity. Furthermore, numerical issues (e.g., floating point underflow) limit the data scale in some applications. Graphics Processors (GPUs) have been used to accelerate genomic data analytics, such as sequence alignment, single-Nucleotide Polymorphism (SNP) detection, and Minor Allele Frequency (MAF) computation. As MAF computation is the most timeconsuming task in GWAS, the authors discuss in detail their techniques of accelerating this task using the GPU. They first present a reduction-based algorithm that better matches the GPU’s data-parallelism feature than the original algorithm implemented in the CPU-based tool. Then they implement this algorithm on the GPU efficiently by carefully optimizing local memory utilization and avoiding user-level synchronization. As the MAF computation suffers from floating point underflow, the authors transform the computation to logarithm space. In addition to the MAF computation, they briefly introduce the GPUaccelerated sequence alignment and SNP detection. The experimental results show that the GPU-based GWAS implementations can accelerate state-of-the-art CPU-based tools by up to an order of magnitude. Mian Lu Institute of High Performance Computing, A*STAR, Singapore Qiong Luo Hong Kong University of Science and Technology, Hong Kong

[1]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[2]  Qiong Luo,et al.  GSNP: A DNA Single-Nucleotide Polymorphism Detection System with GPU Acceleration , 2011, 2011 International Conference on Parallel Processing.

[3]  Asan,et al.  Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude , 2010, Science.

[4]  Bertil Schmidt,et al.  Multiple Sequence Alignment on an FPGA , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[5]  Qiong Luo,et al.  Integrating GPU-Accelerated Sequence Alignment and SNP Detection for Genome Resequencing Analysis , 2012, SSDBM.

[6]  Siu-Ming Yiu,et al.  High Throughput Short Read Alignment via Bi-directional BWT , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[7]  A. Kasarskis,et al.  A window into third-generation sequencing. , 2010, Human molecular genetics.

[8]  Huanming Yang,et al.  Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants , 2010, Nature Genetics.

[9]  Qiong Luo,et al.  High-performance short sequence alignment with GPU acceleration , 2012, Distributed and Parallel Databases.

[10]  Huanming Yang,et al.  SNP detection for massively parallel whole-genome resequencing. , 2009, Genome research.

[11]  Bingsheng He,et al.  Supporting extended precision on graphics processors , 2010, DaMoN '10.

[12]  Siu-Ming Yiu,et al.  SOAP3: ultra-fast GPU-based parallel alignment tool for short reads , 2012, Bioinform..

[13]  Jignesh M. Patel,et al.  WHAM: A High-Throughput Sequence Alignment Method , 2011, TODS.

[14]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[15]  Yingrui Li,et al.  Estimation of allele frequency and association mapping using next-generation sequencing data , 2011, BMC Bioinformatics.

[16]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[17]  Graham Pullan,et al.  BarraCUDA - a fast short read sequence aligner using graphics processing units , 2011, BMC Research Notes.

[18]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[19]  Qiong Luo,et al.  Accelerating minor allele frequency computation with graphics processors , 2012, BigMine '12.

[20]  Nesime Tatbul,et al.  Incremental DNA Sequence Analysis in the Cloud , 2012, SSDBM.

[21]  Andrew D. Johnson,et al.  Bmc Medical Genetics an Open Access Database of Genome-wide Association Results , 2009 .

[22]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[23]  Joshua S. Paul,et al.  Genotype and SNP calling from next-generation sequencing data , 2011, Nature Reviews Genetics.

[24]  M. Schatz,et al.  Searching for SNPs with cloud computing , 2009, Genome Biology.