Background: We present swps3, a vectorized implementation of the Smith-Waterman local alignment algorithm optimized for both the Cell/BE and ×86 architectures. The paper describes swps3 and compares its performances with several other implementations. Findings: Our benchmarking results show that swps3 is currently the fastest implementation of a vectorized Smith-Waterman on the Cell/BE, outperforming the only other known implementation by a factor of at least 4: on a Playstation 3, it achieves up to 8.0 billion cell-updates per second (GCUPS). Using the SSE2 instruction set, a quad-core Intel Pentium can reach 15.7 GCUPS. We also show that swps3 on this CPU is faster than a recent GPU implementation. Finally, we note that under some circumstances, alignments are computed at roughly the same speed as BLAST, a heuristic method. Conclusion: The Cell/BE can be a powerful platform to align biological sequences. Besides, the performance gap between exact and heuristic methods has almost disappeared, especially for long protein sequences. Background Alignments are used in bioinformatics to compare biological sequences. The gold standard of sequence alignment is the optimal local sequence alignment with affine gap costs by Smith and Waterman [1,2]. Modern implementations achieve high performances through the use of SIMD instructions, which perform operations on multiple values in parallel. Such vectorized implementations for general purpose desktop processors include previous work by Wozniak [3], Rognes and Seeberg [4], and Farrar [5]. The latter is by a significant margin the fastest implementation on ×86 architectures with SSE2 (streaming SIMD extensions) instruction set. As for other platforms, Sachdeva et al. [6] ported the Altivec kernel of ssearch34 from the FASTA package [7,3] to the Cell/BE, but no implementation is publicly available, according to our knowledge. Another solution has been provided by Manavski and Valle [8] on general purpose graphics hardware. In this article, we introduce swps3, an implementation of the Smith-Waterman algorithm that extends Farrar's work to the IBM Cell/BE platform. There, the improvement in runtime over results reported by Sachdeva et al. [6] are at least fourfold. The code also improves Farrar's work on Published: 29 October 2008 BMC Research Notes 2008, 1:107 doi:10.1186/1756-0500-1-107 Received: 24 July 2008 Accepted: 29 October 2008 This article is available from: http://www.biomedcentral.com/1756-0500/1/107 © 2008 Szalkowski et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. BMC Research Notes 2008, 1:107 http://www.biomedcentral.com/1756-0500/1/107 Page 2 of 4 (page number not for citation purposes) ×86 architectures, mainly by supporting multi-core processors. In the following, we first present benchmarking results achieved with swps3 and compare them to other implementations. In the second part of the article, we discuss implementation details and the improvements over Farrar's algorithm. Results By implementing Farrar's algorithm on IBM Cell/BE and exploiting all available processor cores, swps3 achieves higher alignment speed than previous implementations. In the following, we compare swps3 with the following tools: ssearch35 [7], swsse [5], WU-BLAST 2.0 [9], and NCBI-BLAST 2.2.18 [10]. The queries consist of protein sequences aligned against release 55.1 of the Swiss-Prot [11] database featuring 129,199,355 amino acids in 359,942 sequences. The set of query sequences is an extension of Farrar's [5], with 7 longer sequences with length up to 4000 amino acids. Throughout the tests, we use the BLOSUM50 scoring matrix [12]. Curiously, the speed of NCBI-BLAST appears to be highly sensitive with respect to the scoring matrix. For instance, we observed runs that were twice as fast using BLOSUM62. All benchmarks were performed on Gentoo Linux with a 64-bit 2.6 kernel deployed on either an Intel Pentium Core 2 Quad Q6600 (2.4 GHz) or a Sony Playstation 3 featuring a Cell/BE (3.2 GHz) and 256 MiB XDRAM. Note that in this configuration, only six out of eight SPEs are available to the user. Figure 1 presents the benchmarking results of our tool on different multi-core architectures. To put these results into a broader context, we included the runtime of multithreaded WU-BLAST and NCBI-BLAST converted to a GCUPS-equivalent as well as performance data obtained by Manavski and Valle [8] on a GPU architecture. The runtime of ssearch and swsse are roughly the same as swps3, and are therefore omitted in the figure for the sake of clarity. Throughout the benchmark, the Intel Pentium Q6600 was the fastest platform. On that machine, swps3 reaches Performance Evaluation Figure 1 Performance Evaluation. Performance of gapped local alignment implementations on different multi-core architectures in GCUPS . P0 22 32 [1 44 ]
[1]
Maria Jesus Martin,et al.
The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
,
2003,
Nucleic Acids Res..
[2]
Nicholas Nethercote,et al.
Dynamic Binary Analysis and Instrumentation
,
2004
.
[3]
Giorgio Valle,et al.
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment
,
2008,
BMC Bioinformatics.
[4]
Andrzej Wozniak,et al.
Using video-oriented instructions to speed up sequence comparison
,
1997,
Comput. Appl. Biosci..
[5]
M S Waterman,et al.
Identification of common molecular subsequences.
,
1981,
Journal of molecular biology.
[6]
Michael Farrar,et al.
Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations
,
2007
.
[7]
Michael Kistler,et al.
Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications
,
2007,
2007 IEEE International Parallel and Distributed Processing Symposium.
[8]
D. Lipman,et al.
Improved tools for biological sequence comparison.
,
1988,
Proceedings of the National Academy of Sciences of the United States of America.
[9]
S. Henikoff,et al.
Amino acid substitution matrices from protein blocks.
,
1992,
Proceedings of the National Academy of Sciences of the United States of America.
[10]
Thomas L. Madden,et al.
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
,
1997,
Nucleic acids research.
[11]
O. Gotoh.
An improved algorithm for matching biological sequences.
,
1982,
Journal of molecular biology.
[12]
Torbjørn Rognes,et al.
Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors
,
2000,
Bioinform..