Edit Distance with Block Operations

We consider the problem of edit distance in which block operations are allowed, i.e. we ask for the minimal number of (block) operations that are needed to transform a string s to t. We give O(log n) approximation algorithms, where n is the total length of the input strings, for the variants of the problem which allow the following sets of operations: block move; block move and block delete; block move and block copy; block move, block copy, and block uncopy. The results still hold if we additionally allow any of the following operations: character insert, character delete, block reversal, or block involution (involution is a generalisation of the reversal). Previously, algorithms only for the first and last variant were known, and they had approximation ratios O(log n log^*n) and O(log n (log^*n)^2), respectively. The edit distance with block moves is equivalent, up to a constant factor, to the common string partition problem, in which we are given two strings s, t and the goal is to partition s into minimal number of parts such that they can be permuted in order to obtain t. Thus we also obtain an O(log n) approximation for this problem (compared to the previous O(log n log^* n)). The results use a simplification of the previously used technique of locally consistent parsing, which groups short substrings of a string into phrases so that similar substrings are guaranteed to be grouped in a similar way. Instead of a sophisticated parsing technique relying on a deterministic coin tossing, we use a simple one based on a partition of the alphabet into two subalphabets. In particular, this lowers the running time from O(n log^* n) to O(n). The new algorithms (for block copy or block delete) use a similar algorithm, but the analysis is based on a specially tuned combinatorial function on sets of numbers.

[1]  Uzi Vishkin,et al.  On a Parallel-Algorithms Method for String Matching Problems , 1994, CIAC.

[2]  Christian Komusiewicz,et al.  Minimum Common String Partition Parameterized by Partition Size Is Fixed-Parameter Tractable , 2013, SODA.

[3]  Dana Shapira,et al.  Edit Distance with Block Deletions , 2011, Algorithms.

[4]  Graham Cormode,et al.  The string edit distance matching problem with moves , 2007, TALG.

[5]  Marek Chrobak,et al.  The greedy algorithm for the minimum common string partition problem , 2005, TALG.

[6]  Kurt Mehlhorn,et al.  Maintaining dynamic sequences under equality tests in polylogarithmic time , 1994, SODA '94.

[7]  S. Muthukrishnan,et al.  Simple and Practical Sequence Nearest Neighbors with Block Operations , 2002, CPM.

[8]  Petr Kolman,et al.  Minimum Common String Partition Problem: Hardness and Approximations , 2004, Electron. J. Comb..

[9]  Uzi Vishkin,et al.  Symmetry breaking for suffix tree construction , 1994, STOC '94.

[10]  Artur Jez,et al.  Approximation of grammar-based compression via recompression , 2013, Theor. Comput. Sci..

[11]  Richard Cole,et al.  Deterministic Coin Tossing with Applications to Optimal Parallel List Ranking , 2018, Inf. Control..

[12]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[13]  Artur Jez,et al.  Faster Fully Compressed Pattern Matching by Recompression , 2011, ICALP.

[14]  Dana Shapira,et al.  Edit distance with move operations , 2002, J. Discrete Algorithms.

[15]  Marek Chrobak,et al.  The Greedy Algorithm for the Minimum Common String Partition Problem , 2004, APPROX-RANDOM.