Efficient algorithms for the block edit problems

In this paper, we focus on the edit distance between two given strings where block-edit operations are allowed and better fitting to the human natural edit behaviors. Previous results showed that this problem is NP-hard when block moves are allowed. Various approximations to this problem have been proposed in recent years. However, this problem can be solved by the polynomial-time optimization algorithms if some reasonable restrictions are applied. The restricted variations which we consider involve character insertions, character deletions, block copies and block deletions. In this paper, three problems are defined with different measuring functions, which are P(EIS,C), P(EI,L) and P(EI,N). Then we show that with some preprocessing, the minimum block edit distances of these three problems can be obtained by dynamic programming in O(nm), O(nmlogm) and O(nm^2) time, respectively, where n and m are the lengths of the two input strings.

[1]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[2]  Haim Kaplan,et al.  The greedy algorithm for edit distance with moves , 2006, Inf. Process. Lett..

[3]  Mark Daniel Ward,et al.  Analysis of the average depth in a suffix tree under a Markov model , 2005 .

[4]  Chun-Ching Wang,et al.  Fragile Watermarking Algorithm for DCT-Domain Image Authentication and Recompression , 2007 .

[5]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[6]  Hsing-Yen Ann,et al.  Efficient algorithms for finding interleaving relationship between sequences , 2008, Inf. Process. Lett..

[7]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[8]  Alessandro Bogliolo,et al.  Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism , 2004, Inf. Process. Lett..

[9]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[10]  S. Muthukrishnan,et al.  Approximate nearest neighbors and sequence comparison with block operations , 2000, STOC '00.

[11]  Dana Shapira,et al.  Edit distance with move operations , 2002, J. Discrete Algorithms.

[12]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[13]  Richard C. T. Lee,et al.  Systolic algorithms for the longest common subsequence problem , 1987 .

[14]  Richard Cole,et al.  Approximate string matching: a simpler faster algorithm , 2002, SODA '98.

[15]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[16]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[17]  Hsing-Yen Ann,et al.  Dynamic programming algorithms for the mosaic longest common subsequence problem , 2007, Inf. Process. Lett..

[18]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[19]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[20]  Funda Ergün,et al.  Comparing Sequences with Segment Rearrangements , 2003, FSTTCS.

[21]  Marek Chrobak,et al.  The greedy algorithm for the minimum common string partition problem , 2005, TALG.

[22]  Chang-Biau Yang,et al.  Algorithms for the Merged-LCS Problem and Its Variant with Block Constraint ∗ , 2006 .

[23]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[24]  Claus Rick Simple and fast linear space computation of longest common subsequences , 2000, Inf. Process. Lett..

[25]  Dana Shapira,et al.  Large Edit Distance with Multiple Block Operations , 2003, SPIRE.

[26]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[27]  Daniel P. Lopresti,et al.  Block Edit Models for Approximate String Matching , 1997, Theor. Comput. Sci..

[28]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .