Efficient Algorithms for the Flexible Longest Common Subsequence Problem

Given two sequences, the traditional longest common subsequence (LCS) problem is to obtain the common subsequence with the maximum number of matches, without considering the continuity of the matched characters. However, in many applications, the matching results with higher continuity are more meaningful than the sparse ones, even if the number of matched characters is a little lower. Accordingly, we define a new variant of the LCS problem, called the flexible longest common subsequence (FLCS) problem. In this paper, we design a scoring function to estimate the continuity of a matching result between two strings. We show that the optimal solution of FLCS can be determined in O(n) time, where n denotes the longer length of the two input sequences. Therefore, the results in this paper offer a new efficient tool for sequence analysis.

[1]  Hsing-Yen Ann,et al.  Efficient algorithms for finding interleaving relationship between sequences , 2008, Inf. Process. Lett..

[2]  Nikolaus Augsten,et al.  RTED: A Robust Algorithm for the Tree Edit Distance , 2011, Proc. VLDB Endow..

[3]  Hsing-Yen Ann,et al.  Efficient algorithms for the block edit problems , 2010, Inf. Comput..

[4]  Chang-Biau Yang,et al.  The Longest Common Subsequence Problem with Variable Gapped Constraints , 2011 .

[5]  Hsing-Yen Ann,et al.  Efficient Algorithms for the Longest Common Subsequence Problem with Sequential Substring Constraints , 2011, 2011 IEEE 11th International Conference on Bioinformatics and Bioengineering.

[6]  Manuel Bernal-Urbina,et al.  Dynamic signature verification through the Longest Common Subsequence Problem and Genetic Algorithms , 2010, IEEE Congress on Evolutionary Computation.

[7]  Hsing-Yen Ann,et al.  A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings , 2008, Inf. Process. Lett..

[8]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[9]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[10]  Costas S. Iliopoulos,et al.  Algorithms for computing variants of the longest common subsequence problem , 2008, Theor. Comput. Sci..

[11]  Olufemi A. Omitaomu,et al.  Weighted dynamic time warping for time series classification , 2011, Pattern Recognit..

[12]  Hung-Hsuan Huang,et al.  Time Series Classification Method Based on Longest Common Subsequence and Textual Approximation , 2012, Seventh International Conference on Digital Information Management (ICDIM 2012).

[13]  Chang-Biau Yang,et al.  Efficient Sparse Dynamic Programming for the Merged LCS Problem , 2008, BIOCOMP.

[14]  Takashi Ishida,et al.  Acceleration of sequence clustering using longest common subsequence filtering , 2013, BMC Bioinformatics.