Fast Algorithms for Local Similarity Queries in Two Sequences

In sequence comparison, finding local similarities in given strings is a very important well-known problem. In this work we introduce two local sequence similarity query problems, and present algorithms for them. Our algorithms use a data structure that supports constant time longest common extension queries. This data structure is created only once, and in time linear in the size of the input strings. After this step all subsequent local similarity queries can be answered very fast. Existing algorithms take significantly more time in answering these queries.

[1]  Alberto Apostolico,et al.  The longest common subsequence problem revisited , 1987, Algorithmica.

[2]  Jean-Paul Comet,et al.  Pairwise Sequence Alignment using a PROSITE Pattern-derived Similarity Score , 2002, Comput. Chem..

[3]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[4]  Eugene W. Myers,et al.  AnO(ND) difference algorithm and its variations , 1986, Algorithmica.

[5]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[6]  Lucian Ilie,et al.  Practical Algorithms for the Longest Common Extension Problem , 2009, SPIRE.

[7]  Ömer Egecioglu,et al.  Algorithms For The Constrained Longest Common Subsequence Problems , 2005, Int. J. Found. Comput. Sci..

[8]  Abdullah N. Arslan Regular expression constrained sequence alignment , 2007, J. Discrete Algorithms.

[9]  Francis Y. L. Chin,et al.  Performance analysis of some simple heuristics for computing longest common subsequences , 1994, Algorithmica.

[10]  G. R. Cross,et al.  An improved algorithm to find the length of the longest common subsequence of two strings , 1989, SIGF.

[11]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[12]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[13]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[14]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[15]  Gad M. Landau,et al.  Approximating the 2-interval pattern problem , 2008, Theor. Comput. Sci..

[16]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[17]  Eugene W. Myers,et al.  An O(NP) Sequence Comparison Algorithm , 1990, Inf. Process. Lett..

[18]  Costas S. Iliopoulos,et al.  Algorithms for computing variants of the longest common subsequence problem , 2008, Theor. Comput. Sci..

[19]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[20]  Prudence W. H. Wong,et al.  Efficient constrained multiple sequence alignment with performance guarantee , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[21]  Alfredo De Santis,et al.  A simple algorithm for the constrained sequence problems , 2004, Information Processing Letters.

[22]  Abdullah N. Arslan A Fast Longest Common Subsequence Algorithm for Similar Strings , 2010, LATA.

[23]  Ömer Egecioglu,et al.  A new approach to sequence comparison: normalized sequence alignment , 2001, Bioinform..

[24]  Yahiko Kambayashi,et al.  A longest common subsequence algorithm suitable for similar text strings , 1982, Acta Informatica.

[25]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[26]  Eugene W. Myers,et al.  A file comparison program , 1985, Softw. Pract. Exp..

[27]  Prudence W. H. Wong,et al.  Efficient constrained multiple sequence alignment with performance guarantee. , 2005, Proceedings. IEEE Computer Society Bioinformatics Conference.

[28]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[29]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[30]  Maxime Crochemore,et al.  Longest repeats with a block of k don't cares , 2006, Theor. Comput. Sci..

[31]  Yin-Te Tsai,et al.  The constrained longest common subsequence problem , 2003, Inf. Process. Lett..