Approximate Swapped Matching

Let a text string T of n symbols and a pattern string P of m symbols from alphabet Σ be given. A swapped version P′ of P is a length m string derived from P by a series of local swaps, (i.e. p′l ← pl+1 and p′l+1 ← pl) where each element can participate in no more than one swap. The Pattern Matching with Swaps problem is that of finding all locations i of T for which there exists a swapped version P′ of P with an exact matching of P′ in location i of T. Recently, some efficient algorithms were developed for this problem. Their time complexity is better than the best known algorithms for pattern matching with mismatches. However, the Approximate Pattern Matching with Swaps problem was not known to be solved faster than the pattern matching with mismatches problem. In the Approximate Pattern Matching with Swaps problem the output is, for every text location i where there is a swapped match of P, the number of swaps necessary to create the swapped version that matches location i. The fastest known method to-date is that of counting mismatches and dividing by two. The time complexity of this method is O(n√m log m) for a general alphabet Σ. In this paper we show an algorithm that counts the number of swaps at every location where there is a swapped matching in time O(n log m log σ), where σ = min(m, |Σ|). Consequently, the total time for solving the approximate pattern matching with swaps problem is O(f(n, m) + n log m log σ), where f(n, m) is the time necessary for solving the pattern matching with swaps problem.

[1]  Gad M. Landau,et al.  Incremental String Comparison , 1998, SIAM J. Comput..

[2]  Gad M. Landau,et al.  Efficient Special Cases of Pattern Matching with Swaps , 1998, Inf. Process. Lett..

[3]  Karl R. Abrahamson Generalized String Matching , 1987, SIAM J. Comput..

[4]  Richard Cole,et al.  Approximate string matching: a simpler faster algorithm , 2002, SODA '98.

[5]  R. Cole,et al.  Randomized Swap Matching in $O(m \log m \log , 1999 .

[6]  M V Olson,et al.  A Time to Sequence , 1995, Science.

[7]  Robert A. Wagner,et al.  An Extension of the String-to-String Correction Problem , 1975, JACM.

[8]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[9]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[10]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[11]  Amihood Amir,et al.  Efficient 2-dimensional approximate matching of non-rectangular figures , 1991, SODA '91.

[12]  Moni Naor,et al.  Small-Bias Probability Spaces: Efficient Constructions and Applications , 1993, SIAM J. Comput..

[13]  Gad M. Landau,et al.  Pattern matching with swaps , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[14]  Zvi Galil,et al.  An Improved Algorithm for Approximate String Matching , 1989, SIAM J. Comput..

[15]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[16]  Robert A. Wagner,et al.  On the complexity of the Extended String-to-String Correction Problem , 1975, STOC.

[17]  S. Muthukrishnan,et al.  String Matching Under a General Matching Relation , 1995, Inf. Comput..