Bit-parallel Computation of Local Similarity Score Matrices with Unitary Weights

Local similarity computation between two sequences permits detecting all the relevant alignments present between subsequences thereof. A well-known dynamic programming algorithm works in time O(mn), m and n being the lengths of the subsequences. The algorithm is rather slow when applied over many sequence pairs. In this paper we present the first bit-parallel computation of the score matrix, for a simplified choice of scores. If the computer word has w bits, then the resulting algorithm works in O(mn log min(m, n, w)/w) time, achieving up to 8-fold speedups in practice. Some DNA comparison applications use precisely the simplified scores we handle, and thus our algorithm is directly applicable. In others, our method could be used as a raw filter to discard most of the strings, so the classical algorithm can be focused only on the substring pairs that can yield relevant results.

[1]  Gad M. Landau,et al.  A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices , 2003, SIAM J. Comput..

[2]  Gonzalo Navarro,et al.  Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences , 2002 .

[3]  Eugene W. Myers,et al.  A fast bit-vector algorithm for approximate string matching based on dynamic programming , 1998, JACM.

[4]  Anne Bergeron,et al.  Vector Algorithms for Approximate String Matching , 2002, Int. J. Found. Comput. Sci..

[5]  Gonzalo Navarro,et al.  Bit-Parallel Witnesses and Their Applications to Approximate String Matching , 2004, Algorithmica.

[6]  Maxime Crochemore,et al.  Speeding-up Hirschberg and Hunt-Szymanski LCS Algorithms , 2001, Fundam. Informaticae.

[7]  Eugene W. Myers,et al.  A subquadratic algorithm for approximate limited expression matching , 2005, Algorithmica.

[8]  Masayuki Takeda,et al.  A Bit-Parallel Tree Matching Algorithm for Patterns with Horizontal VLDC's , 2005, SPIRE.

[9]  Heikki Hyyrö Explaining and Extending the Bit-parallel Approximate String Matching Algorithm of Myers , 2001 .

[10]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[11]  Scott F. Smith Homology search with binary and trinary scoring matrices , 2006, Int. J. Bioinform. Res. Appl..

[12]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[13]  Gaston H. Gonnet,et al.  A new approach to text searching , 1992, CACM.

[14]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[15]  Gonzalo Navarro,et al.  Faster Bit-Parallel Approximate String Matching , 2002, CPM.