Ultra-fast Multiple Genome Sequence Matching Using GPU

Abstract—In this paper, a contrastive evaluation of massivelyparallel implementations of suffix tree and suffix array toaccelerate genome sequence matching are proposed based onIntel Core i7 3770K quad-core and NVIDIA GeForce GTX680GPU. Besides suffix array only held approximately 20%˘30%of the space relative to suffix tree, the coalesced binary searchand tile optimization make suffix array clearly outperform suffixtree using GPU. Consequently, the experimental results show thatmultiple genome sequence matching based on suffix array is morethan 99 times speedup than that of CPU serial implementation.There is no doubt that massively parallel matching algorithmbased on suffix array is an efficient approach to high-performancebioinformatics applications.Keywords—binary search, bioinformatics, GPU, suffix array,suffix tree I. INTRODUCTIONIn recent years, modern multi-core and many-core architec-tures are revolutionizing high performance computing (HPC).As more and more processor cores are being incorporated intoa single chip, the era of the many-core processor is coming.The emergence of many-core architectures, such as computeunified device architecture (CUDA)-enabled GPUs [1] andother accelerator technologies (field-programmable gate arrays(FPGAs) and the Cell/BE), these technologies open up thethe possibility of significantly reduce the runtime of manybiological algorithms on commonly available and inexpensivehardware with more powerful high-performance computingpower. Since the introduction of CUDA in 2007, more than100 million computers with CUDA capable Graphics Pro-cessing Units have been shipped to end users. In the goldenage of the GPU computing, with such a low barrier of entry,researchers all over the world have been engaged in developingnew algorithms and applications to utilize the extreme floatingpoint execution throughput of these GPUs.Life science have emerged as a primary application areafor the use of GPU computing. High-throughput techniquesfor DNA sequencing and gene expression analysis have ledto an explosion of biological data. Prominent examples arethe growth of DNA sequence information in NCBI’s GenBankdatabase and the growth of protein sequences in the UniProtK-B/TrEMBL database. Furthermore, emerging next-generationsequencing technologies [2] have broken many experimentalbarriers to genome scale sequencing. Due to GPU performancegrows faster than CPU performance, the use of GPUs inbioinformatics is a more appropriate strategy.The suffix tree of a string is the compact trie of allits suffixes of the string, it’s widely used in bioinformaticsapplications [3], e.g., MUMmer [4] and MUMmerGPU [5].There are several approaches to construct the suffix tree inlinear time [6] [7] [8]. Nevertheless, with the growth of the ref-erence sequence, the suffix tree will fall into the bottleneck ofDynamic Random Access Memory consumption. Because ofthe efficient usage of the cache memory and suffix array onlytake about 20%˘30% of the space relative to suffix tree, thesuffix array are sometimes preferred to the suffix tree in GPUs,i.e., genome sequence matching can be efficiently solved withsuffix array. Meanwhile, in CPU there exist serial algorithms toconstruct suffix array in linear time [9] [10]. In this paper, GPUimplementations (suffix tree and suffix array) and optimizationare presented to accelerate multiple genetic matching on twodifferent platforms: multi-core (CPUs) and many-core (GPU).The GPU implementations show a tremendous performanceboost, here the suffix array is more than 99 times speedupthan that of CPU serial implementation and the suffix treesspeedup is approximately to 44-fold.II. M

[1]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[2]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[3]  Kurt Keutzer,et al.  The Concurrency Challenge , 2008, IEEE Design & Test of Computers.

[4]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[5]  Nuno Roma,et al.  Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays , 2011, 2011 International Conference on High Performance Computing & Simulation.

[6]  Ulf Assarsson,et al.  Fast parallel GPU-sorting using a hybrid algorithm , 2008, J. Parallel Distributed Comput..

[7]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[8]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[9]  David R. Kaeli,et al.  Heterogeneous Computing with OpenCL - Revised OpenCL 1.2 Edition , 2012 .

[10]  William F. Smyth,et al.  A taxonomy of suffix array construction algorithms , 2007, CSUR.

[11]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[12]  S. Muthukrishnan,et al.  On the sorting-complexity of suffix tree construction , 2000, JACM.

[13]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[14]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[15]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[16]  J. Kulpa,et al.  Time-frequency analysis using NVIDIA compute unified device architecture (CUDA) , 2009, Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA).

[17]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[18]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..