Hardness of Longest Common Subsequence for Sequences with Bounded Run-Lengths

The longest common subsequence (LCS) problem is a classic and well-studied problem in computer science with extensive applications in diverse areas ranging from spelling error corrections to molecular biology. This paper focuses on LCS for fixed alphabet size and fixed run-lengths (i.e., maximum number of consecutive occurrences of the same symbol). We show that LCS is NP-complete even when restricted to (i) alphabets of size 3 and run-length at most 1, and (ii) alphabets of size 2 and run-length at most 2 (both results are tight). For the latter case, we show that the problem is approximable within ratio 3/5.

[1]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[2]  Piotr Berman,et al.  On the Complexity of Approximating the Independent Set Problem , 1989, Inf. Comput..

[3]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[4]  Joseph S. B. Mitchell A Geometric Shortest Path Problem, with Application to Computing a Longest Common Subsequence in Run-length Encoded Strings , 2008 .

[5]  Richard C. T. Lee,et al.  Finding a longest common subsequence between a run-length-encoded string and an uncompressed string , 2008, J. Complex..

[6]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1995, SIAM J. Comput..

[7]  Kuan-Yu Chen,et al.  Finding All Approximate Gapped Palindromes , 2009, ISAAC.

[8]  Krzysztof Pietrzak,et al.  On the parameterized complexity of the fixed alphabet shortest common supersequence and longest common subsequence problems , 2003, J. Comput. Syst. Sci..

[9]  János Csirik,et al.  An Improved Algorithm for Computing the Edit Distance of Run-Length Coded Strings , 1995, Inf. Process. Lett..

[10]  Hsing-Yen Ann,et al.  A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings , 2008, Inf. Process. Lett..

[11]  Michael R. Fellows,et al.  Parameterized complexity analysis in computational biology , 1995, Comput. Appl. Biosci..

[12]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1994, SIAM J. Comput..

[13]  Paola Bonizzoni,et al.  Experimenting an approximation algorithm for the LCS , 2001, Discret. Appl. Math..

[14]  Michael R. Fellows,et al.  The Parameterized Complexity of Sequence Alignment and Consensus , 1994, CPM.

[15]  Y. L. Wang,et al.  A fast algorithm for finding the positions of all squares in a run-length encoded string , 2009, Theor. Comput. Sci..

[16]  Gad M. Landau,et al.  Matching for Run-Length Encoded Strings , 1999, J. Complex..

[17]  Gad M. Landau,et al.  Computing Similarity of Run-Length Encoded Strings with Affine Gap Penalty , 2005, SPIRE.

[18]  Yue-Li Wang,et al.  Sequence Alignment Algorithms for Run-Length-Encoded Strings , 2008, COCOON.

[19]  Ayumi Shinohara,et al.  Efficient algorithms to compute compressed longest common substrings and compressed palindromes , 2009, Theor. Comput. Sci..

[20]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[21]  Hsing-Yen Ann,et al.  Fast algorithms for computing the constrained LCS of run-length encoded strings , 2012, Theor. Comput. Sci..

[22]  Alessandro Bogliolo,et al.  Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism , 2004, Inf. Process. Lett..

[23]  M. Crochemore,et al.  Algorithms on Strings: Tools , 2007 .

[24]  Richard C. T. Lee,et al.  Edit distance for a run-length-encoded string and an uncompressed string , 2007, Inf. Process. Lett..