Longest Common Subsequence in k Length Substrings

In this paper we define a new problem, motivated by computational biology, LCSk aiming at finding the maximal number of k length substrings, matching in both input string while preserving their order of appearance in the input strings. The traditional LCS definition is a spacial case of our problem, where k = 1. We provide an algorithm, solving the general case in On 2 time, where n is the length of the input, equaling the time required for the special case of k = 1. The space requirement is Okn. In order to enable backtracking of the solution On 2 space is needed.

[1]  Gad M. Landau,et al.  Two Algorithms for LCS Consecutive Suffix Alignment , 2004, CPM.

[2]  Kun-Mao Chao,et al.  On the generalized constrained longest common subsequence problems , 2011, J. Comb. Optim..

[3]  Trevor I. Dix,et al.  A Bit-String Longest-Common-Subsequence Algorithm , 1986, Inf. Process. Lett..

[4]  Gad M. Landau,et al.  Matching for Run-Length Encoded Strings , 1999, J. Complex..

[5]  Amihood Amir,et al.  Weighted LCS , 2009, J. Discrete Algorithms.

[6]  Heikki Hyyro Bit-Parallel LCS-length Computation Revisited , 2004 .

[7]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[8]  Gary Benson,et al.  A Bit-Parallel, General Integer-Scoring Sequence Alignment Algorithm , 2013, CPM.

[9]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[10]  A. Gorbenko On the Constrained Longest Common Subsequence Problem , 2010 .

[11]  Yin-Te Tsai,et al.  The constrained longest common subsequence problem , 2003, Inf. Process. Lett..

[12]  Maxime Crochemore,et al.  A fast and practical bit-vector algorithm for the Longest Common Subsequence problem , 2001, Inf. Process. Lett..

[13]  Gad M. Landau,et al.  Restricted LCS , 2010, SPIRE.

[14]  Gad M. Landau,et al.  On the Common Substring Alignment Problem , 2001, J. Algorithms.

[15]  Gad M. Landau,et al.  LCS approximation via embedding into locally non-repetitive strings , 2011, Inf. Comput..

[16]  Minghui Jiang,et al.  The Longest Common Subsequence Problem with Crossing-Free Arc-Annotated Sequences , 2012, SPIRE.

[17]  Gad M. Landau,et al.  Sparse LCS Common Substring Alignment , 2003, CPM.

[18]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[19]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[20]  Amihood Amir,et al.  Generalized LCS , 2007, Theor. Comput. Sci..