论文信息 - Ultra-fast Multiple Genome Sequence Matching Using GPU

Ultra-fast Multiple Genome Sequence Matching Using GPU

Abstract—In this paper, a contrastive evaluation of massivelyparallel implementations of sufﬁx tree and sufﬁx array toaccelerate genome sequence matching are proposed based onIntel Core i7 3770K quad-core and NVIDIA GeForce GTX680GPU. Besides sufﬁx array only held approximately 20%˘30%of the space relative to sufﬁx tree, the coalesced binary searchand tile optimization make sufﬁx array clearly outperform sufﬁxtree using GPU. Consequently, the experimental results show thatmultiple genome sequence matching based on sufﬁx array is morethan 99 times speedup than that of CPU serial implementation.There is no doubt that massively parallel matching algorithmbased on sufﬁx array is an efﬁcient approach to high-performancebioinformatics applications.Keywords—binary search, bioinformatics, GPU, sufﬁx array,sufﬁx tree I. INTRODUCTIONIn recent years, modern multi-core and many-core architec-tures are revolutionizing high performance computing (HPC).As more and more processor cores are being incorporated intoa single chip, the era of the many-core processor is coming.The emergence of many-core architectures, such as computeuniﬁed device architecture (CUDA)-enabled GPUs [1] andother accelerator technologies (ﬁeld-programmable gate arrays(FPGAs) and the Cell/BE), these technologies open up thethe possibility of signiﬁcantly reduce the runtime of manybiological algorithms on commonly available and inexpensivehardware with more powerful high-performance computingpower. Since the introduction of CUDA in 2007, more than100 million computers with CUDA capable Graphics Pro-cessing Units have been shipped to end users. In the goldenage of the GPU computing, with such a low barrier of entry,researchers all over the world have been engaged in developingnew algorithms and applications to utilize the extreme ﬂoatingpoint execution throughput of these GPUs.Life science have emerged as a primary application areafor the use of GPU computing. High-throughput techniquesfor DNA sequencing and gene expression analysis have ledto an explosion of biological data. Prominent examples arethe growth of DNA sequence information in NCBI’s GenBankdatabase and the growth of protein sequences in the UniProtK-B/TrEMBL database. Furthermore, emerging next-generationsequencing technologies [2] have broken many experimentalbarriers to genome scale sequencing. Due to GPU performancegrows faster than CPU performance, the use of GPUs inbioinformatics is a more appropriate strategy.The sufﬁx tree of a string is the compact trie of allits sufﬁxes of the string, it’s widely used in bioinformaticsapplications [3], e.g., MUMmer [4] and MUMmerGPU [5].There are several approaches to construct the sufﬁx tree inlinear time [6] [7] [8]. Nevertheless, with the growth of the ref-erence sequence, the sufﬁx tree will fall into the bottleneck ofDynamic Random Access Memory consumption. Because ofthe efﬁcient usage of the cache memory and sufﬁx array onlytake about 20%˘30% of the space relative to sufﬁx tree, thesufﬁx array are sometimes preferred to the sufﬁx tree in GPUs,i.e., genome sequence matching can be efﬁciently solved withsufﬁx array. Meanwhile, in CPU there exist serial algorithms toconstruct sufﬁx array in linear time [9] [10]. In this paper, GPUimplementations (sufﬁx tree and sufﬁx array) and optimizationare presented to accelerate multiple genetic matching on twodifferent platforms: multi-core (CPUs) and many-core (GPU).The GPU implementations show a tremendous performanceboost, here the sufﬁx array is more than 99 times speedupthan that of CPU serial implementation and the sufﬁx treesspeedup is approximately to 44-fold.II. M

[1] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[2] S. Salzberg,et al. Versatile and open software for comparing large genomes , 2004, Genome Biology.

[3] Kurt Keutzer,et al. The Concurrency Challenge , 2008, IEEE Design & Test of Computers.

[4] Ronald L. Rivest,et al. Introduction to Algorithms, third edition , 2009 .

[5] Nuno Roma,et al. Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays , 2011, 2011 International Conference on High Performance Computing & Simulation.

[6] Ulf Assarsson,et al. Fast parallel GPU-sorting using a hybrid algorithm , 2008, J. Parallel Distributed Comput..

[7] Jie Cheng,et al. CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[8] Esko Ukkonen,et al. On-line construction of suffix trees , 1995, Algorithmica.

[9] David R. Kaeli,et al. Heterogeneous Computing with OpenCL - Revised OpenCL 1.2 Edition , 2012 .

[10] William F. Smyth,et al. A taxonomy of suffix array construction algorithms , 2007, CSUR.

[11] Peter Sanders,et al. Linear work suffix array construction , 2006, JACM.

[12] S. Muthukrishnan,et al. On the sorting-complexity of suffix tree construction , 2000, JACM.

[13] Edward M. McCreight,et al. A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[14] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[15] Amitabh Varshney,et al. High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[16] J. Kulpa,et al. Time-frequency analysis using NVIDIA compute unified device architecture (CUDA) , 2009, Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA).

[17] Hanlee P. Ji,et al. Next-generation DNA sequencing , 2008, Nature Biotechnology.

[18] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..