An all-substrings common subsequence algorithm

Abstract Given two strings A and B of lengths n a and n b , n a ⩽ n b , respectively, the all-substrings longest common subsequence (ALCS) problem obtains, for every substring B ′ of B, the length of the longest string that is a subsequence of both A and B ′ . The ALCS problem has many applications, such as finding approximate tandem repeats in strings, solving the circular alignment of two strings and finding the alignment of one string with several others that have a common substring. We present an algorithm to prepare the basic data structure for ALCS queries that takes O ( n a n b ) time and O ( n a + n b ) space. After this preparation, it is possible to build a matrix of size O ( n b 2 ) that allows any LCS length to be retrieved in constant time. Some trade-offs between the space required and the querying time are discussed. To our knowledge, this is the first algorithm in the literature for the ALCS problem.

[1]  M. Maes,et al.  On a Cyclic String-To-String Correction Problem , 1990, Inf. Process. Lett..

[2]  Alberto Apostolico,et al.  The longest common subsequence problem revisited , 1987, Algorithmica.

[3]  William F. Smyth,et al.  Computing Patterns in Strings , 2003 .

[4]  Gad M. Landau,et al.  On the Common Substring Alignment Problem , 2001, J. Algorithms.

[5]  Pavel A. Pevzner,et al.  Computational molecular biology : an algorithmic approach , 2000 .

[6]  Mi Lu,et al.  Parallel Algorithms for the Longest Common Subsequence Problem , 1994, IEEE Trans. Parallel Distributed Syst..

[7]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[8]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[9]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[10]  Edson Cáceres,et al.  Parallel dynamic programming for solving the string editing problem on a CGM/BSP , 2002, SPAA '02.

[11]  Jeanette P. Schmidt,et al.  All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings , 1998, SIAM J. Comput..

[12]  Claus Rick,et al.  New Algorithms for the Longest Common Subsequence Problem , 1994 .

[13]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[14]  Edson Cáceres,et al.  A BSP/CGM algorithm for the all-substrings longest common subsequence problem , 2003, Proceedings International Parallel and Distributed Processing Symposium.