Semi-local longest common subsequences in subquadratic time

For two strings a, b of lengths m, n, respectively, the longest common subsequence (LCS) problem consists in comparing a and b by computing the length of their LCS. In this paper, we define a generalisation, called ''the all semi-local LCS problem'', where each string is compared against all substrings of the other string, and all prefixes of each string are compared against all suffixes of the other string. An explicit representation of the output lengths is of size @Q((m+n)^2). We show that the output can be represented implicitly by a geometric data structure of size O(m+n), allowing efficient queries of the individual output lengths. The currently best all string-substring LCS algorithm by Alves et al., based on previous work by Schmidt, can be adapted to produce the output in this form. We also develop the first all semi-local LCS algorithm, running in time o(mn) when m and n are reasonably close. Compared to a number of previous results, our approach presents an improvement in algorithm functionality, output representation efficiency, and/or running time.

[1]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[2]  Edson Cáceres,et al.  An all-substrings common subsequence algorithm , 2008, Discret. Appl. Math..

[3]  Sung-Ryul Kim,et al.  A dynamic edit distance table , 2004, J. Discrete Algorithms.

[4]  Joseph JáJá,et al.  Space-Efficient and Fast Algorithms for Multidimensional Dominance Reporting and Counting , 2004, ISAAC.

[5]  Rainer E. Burkard,et al.  Perspectives of Monge Properties in Optimization , 1996, Discret. Appl. Math..

[6]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[7]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[8]  Gad M. Landau,et al.  Two Algorithms for LCS Consecutive Suffix Alignment , 2004, CPM.

[9]  Gad M. Landau,et al.  On the Common Substring Alignment Problem , 2001, J. Algorithms.

[10]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[11]  Grzegorz Rozenberg,et al.  Handbook of Formal Languages , 1997, Springer Berlin Heidelberg.

[12]  Gad M. Landau,et al.  A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices , 2003, SIAM J. Comput..

[13]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[14]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[15]  Alfred V. Aho,et al.  Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[16]  Gad M. Landau,et al.  Incremental String Comparison , 1998, SIAM J. Comput..

[17]  Alexander Tiskin All Semi-local Longest Common Subsequences in Subquadratic Time , 2006, CSR.

[18]  P. Pevzner,et al.  Computational Molecular Biology , 2000 .

[19]  Gad M. Landau,et al.  Re-Use Dynamic Programming for Sequence Alignment: An Algorithmic Toolkit , 2005 .

[20]  Alberto Apostolico,et al.  String Editing and Longest Common Subsequences , 1997, Handbook of Formal Languages.

[21]  Alexander Tiskin Longest Common Subsequences in Permutations and Maximum Cliques in Circle Graphs , 2006, CPM.

[22]  Jeanette P. Schmidt,et al.  All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings , 1998, SIAM J. Comput..