Using frequency distance filteration for reducing database search workload on GPU-based cloud service

The Smith-Waterman algorithm is the most widely used algorithm to analyze the similarity between protein and DNA sequences and suitable for the database search due to its high sensitivity. However, Smith-Waterman still is a very time-consuming method. CUDA programming can efficiently improve the computations by using the computing power of the massive computing hardware as GPUs. In this paper, we proposed an efficient frequency based filter method instead of just speed up the Smith-Waterman comparison but waste computing resource to deal with those unnecessary comparisons. We implemented the Smith-Waterman algorithm by introduction of the techniques from earlier researches and add in our real-time filter method on Graphic Processing Units to filter unnecessary comparisons. We also design a user friendly interface to provide the service in the potential clouding computing environment. In our research we choose two data sets, H1N1 VH protein database and Human protein database then compare CUDA-SW and CUDA-SW with filter, we called CUDA-SWf we can obtain up to 41% performance improve from reduce unnecessary sequence alignments.

[1]  Ambuj K. Singh,et al.  Speeding up whole-genome alignment by indexing frequency vectors , 2004, Bioinform..

[2]  Bowen Alpern,et al.  Microparallelism and High-Performance Protein Matching , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[3]  David W. Mount,et al.  Bioinformatics - sequence and genome analysis (2. ed.) , 2004 .

[4]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[5]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[6]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[8]  Bowen Alpern,et al.  Microparallelism and high-performance protein matching , 1995 .

[9]  Bertil Schmidt,et al.  Reconfigurable architectures for bio-sequence database scanning on FPGAs , 2005, IEEE Transactions on Circuits and Systems II: Express Briefs.

[10]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[11]  Weiguo Liu,et al.  Streaming Algorithms for Biological Sequence Alignment on GPUs , 2007, IEEE Transactions on Parallel and Distributed Systems.

[12]  Witold R. Rudnicki,et al.  The new SIMD Implementation of the Smith-Waterman Algorithm on Cell Microprocessor , 2009, Fundam. Informaticae.

[13]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[14]  Yongchao Liu,et al.  MSA-CUDA: Multiple Sequence Alignment on Graphics Processing Units with CUDA , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[15]  Yang Liu,et al.  GPU Accelerated Smith-Waterman , 2006, International Conference on Computational Science.

[16]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[17]  Kevin Truong,et al.  160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA) , 2007, BMC Bioinformatics.

[18]  Michael S. Farrar Optimizing Smith-Waterman for the Cell Broadband Engine , 2008 .

[19]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[21]  Stephen W. Poole,et al.  Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors , 2010, J. Comput. Phys..

[22]  Chee Keong Kwoh,et al.  CBESW: Sequence Alignment on the Playstation 3 , 2008, BMC Bioinformatics.

[23]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[24]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[25]  Bertil Schmidt,et al.  Using reconfigurable hardware to accelerate multiple sequence alignment with ClustalW , 2005, Bioinform..

[26]  Andrzej Wozniak,et al.  Using video-oriented instructions to speed up sequence comparison , 1997, Comput. Appl. Biosci..