CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences

Biological sequence comparison is a very important operation in Bioinformatics. Even though there do exist exact methods to compare biological sequences, these methods are often neglected due to their quadratic time and space complexity. In order to accelerate these methods, many GPU algorithms were proposed in the literature. Nevertheless, all of them restrict the size of the smallest sequence in such a way that Megabase genome comparison is prevented. In this paper, we propose and evaluate CUDAlign, a GPU algorithm that is able to compare Megabase biological sequences with an exact Smith-Waterman affine gap variant. CUDAlign was implemented in CUDA and tested in two GPU boards, separately. For real sequences whose size range from 1MBP (Megabase Pairs) to 47MBP, a close to uniform GCUPS (Giga Cells Updates per Second) was obtained, showing the potential scalability of our approach. Also, CUDAlign was able to compare the human chromosome 21 and the chimpanzee chromosome 22. This operation took 21 hours on GeForce GTX 280, resulting in a peak performance of 20.375 GCUPS. As far as we know, this is the first time such huge chromosomes are compared with an exact method.

[1]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[2]  Bertil Schmidt,et al.  Computing large-scale alignments on a multi-cluster , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[3]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[4]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[5]  Guang R. Gao,et al.  Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform , 2007, HPRCTA.

[6]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[7]  Srinivas Aluru,et al.  Space and time optimal parallel sequence alignments , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[8]  The International Chimpanzee Chromosome 22 Consortium DNA sequence and comparative analysis of chimpanzee chromosome 22 , 2004 .

[9]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[10]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[11]  Andrzej Wozniak,et al.  Using video-oriented instructions to speed up sequence comparison , 1997, Comput. Appl. Biosci..

[12]  Weiguo Liu,et al.  Bio-sequence database scanning on a GPU , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[13]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[14]  Azzedine Boukerche,et al.  An exact parallel algorithm to compare very long biological sequences in clusters of workstations , 2007, Cluster Computing.

[15]  Chee Keong Kwoh,et al.  CBESW: Sequence Alignment on the Playstation 3 , 2008, BMC Bioinformatics.

[16]  M. Hattori,et al.  DNA sequence and comparative analysis of chimpanzee chromosome 22 , 2004, Nature.

[17]  Xianyang Jiang,et al.  A Reconfigurable Accelerator for Smith–Waterman Algorithm , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.

[18]  Azzedine Boukerche,et al.  A parallel strategy for biological sequence alignment in restricted memory space , 2008, J. Parallel Distributed Comput..

[19]  Azzedine Boukerche,et al.  Reconfigurable Architecture for Biological Sequence Comparison in Reduced Memory Space , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[20]  Xiandong Meng,et al.  Optimised fine and coarse parallelism for sequence homology search , 2006, Int. J. Bioinform. Res. Appl..

[21]  Gregory Francis Pfister,et al.  In search of clusters: the coming battle in lowly parallel computing , 1995 .

[22]  Serafim Batzoglou,et al.  The many faces of sequence alignment , 2005, Briefings Bioinform..

[23]  Witold R. Rudnicki,et al.  An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[24]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[25]  Yang Liu,et al.  GPU Accelerated Smith-Waterman , 2006, International Conference on Computational Science.