Approximate parameterized matching

Two equal length strings <i>s</i> and <i>s</i>′, over alphabets Σ<sub><i>s</i></sub> and Σ<sub><i>s</i></sub>′, <i>parameterize match</i> if there exists a bijection π : Σ<sub><i>s</i></sub> → Σ<sub><i>s</i></sub>′ such that π (<i>s</i>) = <i>s</i>′, where π (<i>s</i>) is the renaming of each character of <i>s</i> via π. <i>Parameterized matching</i> is the problem of finding all parameterized matches of a pattern string <i>p</i> in a text <i>t</i>, and <i>approximate parameterized matching</i> is the problem of finding at each location a bijection π that maximizes the number of characters that are mapped from <i>p</i> to the appropriate |<i>p</i>|-length substring of <i>t</i>. Parameterized matching was introduced as a model for software duplication detection in software maintenance systems and also has applications in image processing and computational biology. For example, approximate parameterized matching models image searching with variable color maps in the presence of errors. We consider the problem for which an error threshold, <i>k</i>, is given, and the goal is to find all locations in <i>t</i> for which there exists a bijection π which maps <i>p</i> into the appropriate |<i>p</i>|-length substring of <i>t</i> with at most <i>k</i> mismatched mapped elements. Our main result is an algorithm for this problem with <i>O</i>(<i>nk</i><sup>1.5</sup> + <i>mk</i> log <i>m</i>) time complexity, where <i>m</i> = |<i>p</i>| and <i>n</i>=|<i>t</i>|. We also show that when |<i>p</i>| = |<i>t</i>| = <i>m</i>, the problem is equivalent to the maximum matching problem on graphs, yielding a <i>O</i>(<i>m</i> + <i>k</i><sup>1.5</sup>) solution.

[1]  Gad M. Landau,et al.  Efficient String Matching with k Mismatches , 2018, Theor. Comput. Sci..

[2]  Brenda S. Baker Parameterized diff , 1999, SODA '99.

[3]  Brenda S. Baker,et al.  Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance , 1997, SIAM J. Comput..

[4]  Gad M. Landau,et al.  Introducing efficient parallelism into approximate string matching and a new serial algorithm , 1986, STOC '86.

[5]  Harold N. Gabow,et al.  Scaling algorithms for network problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[6]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[7]  Uzi Vishkin,et al.  Fast String Matching with k Differences , 1988, J. Comput. Syst. Sci..

[8]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[9]  Gary Benson,et al.  An Alphabet Independent Approach to Two-Dimensional Pattern Matching , 1994, SIAM J. Comput..

[10]  Brenda S. Baker,et al.  A theory of parameterized pattern matching: algorithms and applications , 1993, STOC.

[11]  Kenneth Ward Church,et al.  Separable attributes: a technique for solving the sub matrices character count problem , 2002, SODA '02.

[12]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1987, JACM.

[13]  Ming-Yang Kao,et al.  A Decomposition Theorem for Maximum Weight Bipartite Matchings , 2000, SIAM J. Comput..

[14]  Richard Cole,et al.  Faster suffix tree construction with missing suffix links , 2000, STOC '00.

[15]  Mohan S. Kankanhalli,et al.  Color indexing for efficient image retrieval , 1995, Multimedia Tools and Applications.

[16]  Moshe Lewenstein,et al.  Parameterized matching with mismatches , 2007, J. Discrete Algorithms.

[17]  S. Muthukrishnan,et al.  Alphabet Dependence in Parameterized Matching , 1994, Inf. Process. Lett..

[18]  Moshe Lewenstein,et al.  Approximate Parameterized Matching , 2004, ESA.

[19]  Harold N. Gabow Scaling Algorithms for Network Problems , 1985, J. Comput. Syst. Sci..

[20]  Zvi Galil Optimal Parallel Algorithms for String Matching , 1985, Inf. Control..

[21]  Brenda S. Baker Parameterized Pattern Matching: Algorithms and Applications , 1996, J. Comput. Syst. Sci..

[22]  Z Galil,et al.  Improved string matching with k mismatches , 1986, SIGA.

[23]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[24]  Moshe Lewenstein,et al.  Function Matching: Algorithms, Applications, and a Lower Bound , 2003, ICALP.

[25]  Robert E. Tarjan,et al.  Faster Scaling Algorithms for Network Problems , 1989, SIAM J. Comput..

[26]  Uzi Vishkin,et al.  On Finding Lowest Common Ancestors: Simplification and Parallelization , 1988, AWOC.

[27]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[28]  S. Rao Kosaraju Faster Algorithms for the Construction of Parameterized Suffix Trees (Preliminary Version) , 1995, FOCS.

[29]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..