CUDAlign 3.0: Parallel Biological Sequence Comparison in Large GPU Clusters

This paper proposes and evaluates a parallel strategy to execute the exact Smith-Waterman (SW) biological sequence comparison algorithm for huge DNA sequences in multi-GPU platforms. In our strategy, the computation of a single huge SW matrix is spread over multiple GPUs, which communicate border elements to the neighbour, using a circular buffer mechanism. We also provide a method to predict the execution time and speedup of a comparison, given the number of the GPUs and the sizes of the sequences. The results obtained with a large multi-GPU environment show that our solution is scalable when varying the sizes of the sequences and/or the number of GPUs and that our prediction method is accurate. With our proposal, we were able to compare the largest human chromosome with its homologous chimpanzee chromosome (249 Millions of Base Pairs (MBP) x 228 MBP) using 64 GPUs, achieving 1.7 TCUPS (Tera Cells Updated per Second). As far as we know, this is the largest comparison ever done using the Smith-Waterman algorithm.

[1]  Edans Flavius de Oliveira Sandes,et al.  Smith-Waterman Alignment of Huge Sequences with GPU in Linear Space , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[2]  Yang Liu,et al.  GPU Accelerated Smith-Waterman , 2006, International Conference on Computational Science.

[3]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[4]  Wu-chun Feng,et al.  Optimizing performance, cost, and sensitivity in pairwise sequence search on a cluster of PlayStations , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[5]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[6]  Edans Flavius de Oliveira Sandes,et al.  Retrieving Smith-Waterman Alignments with Optimizations for Megabase Biological Sequences Using GPU , 2013, IEEE Trans. Parallel Distributed Syst..

[7]  Fumihiko Ino,et al.  Sequence Homology Search Using Fine Grained Cycle Sharing of Idle GPUs , 2012, IEEE Transactions on Parallel and Distributed Systems.

[8]  Gregory Francis Pfister,et al.  In search of clusters: the coming battle in lowly parallel computing , 1995 .

[9]  Mile Šikić,et al.  SW#–GPU-enabled exact alignments on genome scale , 2013, Bioinform..

[10]  Jacek Blazewicz,et al.  Protein alignment algorithms with an efficient backtracking routine on multiple GPUs , 2011, BMC Bioinformatics.

[11]  Srinivas Aluru,et al.  Space and time optimal parallel sequence alignments , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[12]  Edans Flavius de Oliveira Sandes,et al.  CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences , 2010, PPoPP '10.

[13]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[14]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[15]  Bertil Schmidt,et al.  An adaptive grid implementation of DNA sequence alignment , 2005, Future Gener. Comput. Syst..

[16]  Weiguo Liu,et al.  Bio-sequence database scanning on a GPU , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[17]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[18]  Stephen W. Poole,et al.  Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors , 2010, J. Comput. Phys..

[19]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[20]  Daniel Etiemble,et al.  Parallel Smith-Waterman Comparison on Multicore and Manycore Computing Platforms with BSP++ , 2012, International Journal of Parallel Programming.