Performance comparison of GPU programming frameworks with the striped Smith-Waterman algorithm

This paper evaluates and discusses how different GPU programming frameworks affect the performance obtained from GPU acceleration of the striped smith-waterman algorithm used for biological sequence alignment. A total of 6 GPU implementations of the algorithm on NVIDIA GT200b and AMD RV870 using the CUDA and the OpenCL frameworks are compared to analyze cons and pros of explicit descriptions for architecture specific hardware mechanisms in the code. The evaluation results show that the primitive descriptions with the CUDA are still efficient especially for small size data, while better instruction scheduling and optimizations are carried out by the OpenCL compiler. On the other hand, the combination of OpenCL and RV870 which provides a relatively simple view of the architecture is efficient for the large data size.

[1]  Andrzej Wozniak,et al.  Using video-oriented instructions to speed up sequence comparison , 1997, Comput. Appl. Biosci..

[2]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[3]  Witold R. Rudnicki,et al.  An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[4]  Michael S. Farrar Optimizing Smith-Waterman for the Cell Broadband Engine , 2008 .

[5]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[6]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[7]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[8]  Samuel Williams,et al.  Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Tsutomu Maruyama,et al.  Performance comparison of FPGA, GPU and CPU in image processing , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[11]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[12]  Yuichiro Shibata,et al.  Highly efficient mapping of the Smith-Waterman algorithm on CUDA-compatible GPUs , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[13]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.