Finding the gapped longest common subsequence by incremental suffix maximum queries

The longest common subsequence (LCS) problem with gap constraints (or the gapped LCS), which has applications to genetics and molecular biology, is an interesting and useful variant to the LCS problem. In previous work, this problem is solved in O(nm) time when the gap constraints are fixed to a single integer, where n and m denote the lengths of the two input sequences A and B, respectively. In this paper, we first generalize the problem from fixed gaps to variable gap constraints. Then, we devise an optimal approach for the incremental suffix maximum query (ISMQ), which helps us obtain an efficient algorithm with O(nm) time for finding LCS with variable gap constraints. In addition, our technique for ISMQ can be applied to solve one of the block edit problems on strings, reducing the time complexity from O(nmlogm+m^2) to O(nm+m^2). Hence, the result of this paper is beneficial to related research on sequence analysis and stringology.

[1]  Hsing-Yen Ann,et al.  Efficient algorithms for finding interleaving relationship between sequences , 2008, Inf. Process. Lett..

[2]  Peter van Emde Boas,et al.  Preserving Order in a Forest in Less Than Logarithmic Time and Linear Space , 1977, Inf. Process. Lett..

[3]  Robert E. Tarjan,et al.  Efficiency of a Good But Not Linear Set Union Algorithm , 1972, JACM.

[4]  Costas S. Iliopoulos,et al.  Algorithms for Computing the Longest Parameterized Common Subsequence , 2007, CPM.

[5]  Hsing-Yen Ann,et al.  Efficient algorithms for the block edit problems , 2010, Inf. Comput..

[6]  Chang-Biau Yang,et al.  An Algorithm and Applications to Sequence Alignment with Weighted Constraints , 2010, Int. J. Found. Comput. Sci..

[7]  S. Schuldiner,et al.  Identification of a Glycine Motif Required for Packing in EmrE, a Multidrug Transporter from Escherichia coli* , 2008, Journal of Biological Chemistry.

[8]  Chang-Biau Yang,et al.  Efficient Sparse Dynamic Programming for the Merged LCS Problem , 2008, BIOCOMP.

[9]  Hsing-Yen Ann,et al.  Dynamic programming algorithms for the mosaic longest common subsequence problem , 2007, Inf. Process. Lett..

[10]  Markus Jaritz,et al.  A conserved cysteine motif essential for ceramide kinase function. , 2008, Biochimie.

[11]  Robert E. Tarjan,et al.  A Linear-Time Algorithm for a Special Case of Disjoint Set Union , 1985, J. Comput. Syst. Sci..

[12]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[13]  Hsing-Yen Ann,et al.  A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings , 2008, Inf. Process. Lett..

[14]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[15]  Costas S. Iliopoulos,et al.  Algorithms for computing variants of the longest common subsequence problem , 2008, Theor. Comput. Sci..