A Filtering Technique for All Pairs Approximate Parameterized String Matching

The paper deals with all pairs approximate parameterized string matching problem with error threshold k, among two sets of equal length strings. Let \(P=\{p_1, ~ p_2, \ldots , p_{n_P}\} \subseteq \varSigma _P^m\) and \(T=\{t_1, ~ t_2, \ldots , t_{n_T}\}\) \(\subseteq \varSigma _T^m\) be two sets of strings where \(|\varSigma _P|=|\varSigma _T|\). For each \(p_i \in P\), the problem is to find \(t_j \in T\) which is approximately parameterized closest to \(p_i\) under the threshold. The solution has complexity \(O(n_P \, n_T \, m)\). We introduce Parikh vector filtering technique in order to preprocess the given strings and avoid the unwanted paired comparisons. The PV-filtering does not change the asymptotic time complexity but rapidly improves running time for small error threshold as shown by experiments.

[1]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[2]  Kalpesh Kapoor,et al.  Weighted approximate parameterized string matching , 2017, AKCE Int. J. Graphs Comb..

[3]  Moshe Lewenstein,et al.  Approximate parameterized matching , 2007, TALG.

[4]  Thierry Lecroq,et al.  Handbook of Exact String Matching Algorithms , 2004 .

[5]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1987, JACM.

[6]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[7]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[8]  Moshe Lewenstein,et al.  Parameterized matching with mismatches , 2007, J. Discrete Algorithms.

[9]  Wojciech Rytter,et al.  Jewels of stringology : text algorithms , 2002 .

[10]  Ming-Yang Kao,et al.  A Decomposition Theorem for Maximum Weight Bipartite Matchings , 2000, SIAM J. Comput..

[11]  Suneeta Agarwal,et al.  Study of Bit-Parallel Approximate Parameterized String Matching Algorithms , 2009, IC3.

[12]  Alberto Apostolico,et al.  Parameterized searching with mismatches for run-length encoded strings , 2010, Theor. Comput. Sci..

[13]  Yoan J. Pinzón,et al.  delta-gamma-Parameterized Matching , 2008, SPIRE.

[14]  Thierry Lecroq,et al.  The exact online string matching problem: A review of the most recent results , 2013, CSUR.

[15]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[16]  Harold N. Gabow Scaling Algorithms for Network Problems , 1985, J. Comput. Syst. Sci..

[17]  Kalpesh Kapoor,et al.  Fine-Tuning Decomposition Theorem for Maximum Weight Bipartite Matching , 2014, TAMC.

[18]  Brenda S. Baker Parameterized Pattern Matching: Algorithms and Applications , 1996, J. Comput. Syst. Sci..

[19]  William F. Smyth,et al.  Computing Patterns in Strings , 2003 .

[20]  Robert E. Tarjan,et al.  Faster Scaling Algorithms for Network Problems , 1989, SIAM J. Comput..

[21]  Brenda S. Baker,et al.  Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance , 1997, SIAM J. Comput..

[22]  Rohit Parikh,et al.  On Context-Free Languages , 1966, JACM.

[23]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[24]  Brenda S. Baker,et al.  A theory of parameterized pattern matching: algorithms and applications , 1993, STOC.

[25]  Maxime Crochemore,et al.  Algorithms For Computing Approximate Repetitions In Musical Sequences , 2002, Int. J. Comput. Math..