Optimizing Smith-Waterman for the Cell Broadband Engine

Motivation: As new processors become available, the SingleInstruction Multiple-Data Smith-Waterman implementations need to be adapted to the processors instruction set to get maximum performance. One recent processor, the Cell Broadband Engine has eight independent vector processors. To take advantage of the Cell’s vector engines, the implementation needs to take into account the limited resources of the vector engine and the limits of the instruction set. Results: The adapted Smith-Waterman implementation running on a single 3.2 GHz Cell Broadband Engine achieved speeds of >16 billion cell update per second with the ability to handle sequences of 32K residues. Availability: http://farrar.michael.googlepages.com/striped.tgz

[1]  Andrzej Wozniak,et al.  Using video-oriented instructions to speed up sequence comparison , 1997, Comput. Appl. Biosci..

[2]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[3]  S. Henikoff,et al.  Amino acid substitution matrices. , 2000, Advances in protein chemistry.

[4]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Michael Kistler,et al.  Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[6]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[7]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[8]  Surin Kittitornkun,et al.  MT-ClustalW: multithreading multiple sequence alignment , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[9]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[10]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[11]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[12]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[14]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[15]  Michael Kistler,et al.  Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications , 2007, IPDPS.

[16]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.