An algorithm for string edit distance allowing substring reversals

The edit distance between given two strings X and Y is the minimum number of edit operations that transform X into Y. Ordinarily, string editing is based on character insert, delete, and substitute operations. It has been suggested that extending this model with block (substring) edits would be useful in applications such as DNA sequence comparison. In its general form, the resulting problem is NP-hard. However, there are efficient algorithms when string edits include only character, and block replacements. We introduce a new edit model which permits insertions, deletions, and substitutions at character level, and also reversals of substrings. We present an algorithm whose worst-case time complexity is O(n2m) where n=|X|lesm=|Y|, and we prove that the average running time of the algorithm is O(nm). Our experiments on randomly generated strings verify these results. The main contribution of this paper is that we present an algorithm to find all possible reversals using a generalized suffix tree, which is fast on average

[1]  S. Muthukrishnan,et al.  An Improved Algorithm for Sequence Comparison with Block Reversals , 2002, LATIN.

[2]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[3]  S. Muthukrishnan,et al.  An efficient algorithm for sequence comparison with block reversals , 2004, Theor. Comput. Sci..

[4]  Daniel P. Lopresti,et al.  Block Edit Models for Approximate String Matching , 1997, Theor. Comput. Sci..

[5]  Dong Kyue Kim,et al.  Efficient Algorithms for Approximate String Matching with Swaps , 1999, J. Complex..

[6]  Dana Shapira,et al.  Large Edit Distance with Multiple Block Operations , 2003, SPIRE.

[7]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[8]  S. Muthukrishnan,et al.  Approximate nearest neighbors and sequence comparison with block operations , 2000, STOC '00.

[9]  Michael S. Waterman,et al.  Introduction to Computational Biology: Maps, Sequences and Genomes , 1998 .

[10]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[11]  Craig A. Stewart,et al.  Introduction to computational biology , 2005 .

[12]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[13]  Abdullah N. Arslan An algorithm with linear expected running time for string editing with substitutions and substring reversals , 2008, Inf. Process. Lett..

[14]  Robert Giegerich,et al.  From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction , 1997, Algorithmica.