Faster STR-IC-LCS computation via RLE

The constrained LCS problem asks one to find a longest common subsequence of two input strings $A$ and $B$ with some constraints. The STR-IC-LCS problem is a variant of the constrained LCS problem, where the solution must include a given constraint string $C$ as a substring. Given two strings $A$ and $B$ of respective lengths $M$ and $N$, and a constraint string $C$ of length at most $\min\{M, N\}$, the best known algorithm for the STR-IC-LCS problem, proposed by Deorowicz~({\em Inf. Process. Lett.}, 11:423--426, 2012), runs in $O(MN)$ time. In this work, we present an $O(mN + nM)$-time solution to the STR-IC-LCS problem, where $m$ and $n$ denote the sizes of the run-length encodings of $A$ and $B$, respectively. Since $m \leq M$ and $n \leq N$ always hold, our algorithm is always as fast as Deorowicz's algorithm, and is faster when input strings are compressible via RLE.

[1]  János Csirik,et al.  An Improved Algorithm for Computing the Edit Distance of Run-Length Coded Strings , 1995, Inf. Process. Lett..

[2]  Dmitry Korkin,et al.  Multiple genome rearrangement: a general approach via the evolutionary genome graph , 2002, ISMB.

[3]  Kun-Mao Chao,et al.  On the generalized constrained longest common subsequence problems , 2011, J. Comb. Optim..

[4]  J. W. Hunt,et al.  An Algorithm for Differential File Comparison , 2008 .

[5]  Mohammad Sohel Rahman,et al.  Longest common subsequence problem for run-length-encoded strings , 2012, 2012 15th International Conference on Computer and Information Technology (ICCIT).

[6]  Alfredo De Santis,et al.  A simple algorithm for the constrained sequence problems , 2004, Information Processing Letters.

[7]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[8]  Helman Stern,et al.  Most discriminating segment - Longest common subsequence (MDSLCS) algorithm for dynamic hand gesture classification , 2013, Pattern Recognit. Lett..

[9]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[10]  Gareth J. Janacek,et al.  A Bit Level Representation for Time Series Data Mining with Shape Based Similarity , 2006, Data Mining and Knowledge Discovery.

[11]  Congmao Wang,et al.  A novel compression tool for efficient storage of genome resequencing data , 2011, Nucleic acids research.

[12]  Hsing-Yen Ann,et al.  Fast algorithms for computing the constrained LCS of run-length encoded strings , 2012, Theor. Comput. Sci..

[13]  Yin-Te Tsai,et al.  The constrained longest common subsequence problem , 2003, Inf. Process. Lett..

[14]  Sebastian Deorowicz,et al.  Quadratic-time algorithm for a string constrained LCS problem , 2011, Inf. Process. Lett..

[15]  Paul Heckel,et al.  A technique for isolating differences between files , 1978, CACM.

[16]  Yue-Li Wang,et al.  Constrained Longest Common Subsequences with Run-Length-Encoded Strings , 2015, Comput. J..

[17]  Yingjie Wu,et al.  A dynamic programming solution to a generalized LCS problem , 2013, Inf. Process. Lett..