Multiple Approximate String Matching

We present two new algorithms for on-line multiple approximate string matching. These are extensions of previous algorithms that search for a single pattern. The single-pattern version of the first one is based on the simulation with bits of a non-deterministic finite automaton built from the pattern and using the text as input. To search for multiple patterns, we superimpose their automata, using the result as a filter. The second algorithm partitions the pattern in sub-patterns that are searched with no errors, with a fast exact multipattern search algorithm. To handle multiple patterns, we search the sub-patterns of all of them together. The average running time achieved is in both cases O(n) for moderate error level, pattern length and number of patterns. They adapt (with higher costs) to the other cases. However, the algorithms differ in speed and thresholds of usefulness. We analyze theoretically when each algorithm should be used, and show experimentally that they are faster than previous solutions in a wide range of cases.

[1]  Fabrizio Luccio,et al.  Simple and Efficient String Matching with k Mismatches , 1989, Inf. Process. Lett..

[2]  Zvi Galil,et al.  An Improved Algorithm for Approximate String Matching , 1989, SIAM J. Comput..

[3]  Udi Manber,et al.  Approximate Multiple Strings Search , 1996, CPM.

[4]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[5]  Alden H. Wright Approximate string matching using withinword parallelism , 1994, Softw. Pract. Exp..

[6]  Uzi Vishkin,et al.  Fast String Matching with k Differences , 1988, J. Comput. Syst. Sci..

[7]  Ricardo A. Baeza-Yates,et al.  Text-Retrieval: Theory and Practice , 1992, IFIP Congress.

[8]  U. Manber,et al.  APPROXIMATE MULTIPLE STRING SEARCH , 1996 .

[9]  Esko Ukkonen,et al.  Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[10]  Esko Ukkonen,et al.  A Comparison of Approximate String Matching Algorithms , 1996, Softw. Pract. Exp..

[11]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[12]  F. Frances Yao,et al.  Multi-index hashing for information retrieval , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[13]  Ricardo A. Baeza-Yates,et al.  A Faster Algorithm for Approximate String Matching , 1996, CPM.

[14]  Thomas G. Marr,et al.  Approximate String Matching and Local Similarity , 1994, CPM.

[15]  Ricardo A. Baeza-Yates,et al.  Fast and Practical Approximate String Matching , 1992, Inf. Process. Lett..

[16]  Esko Ukkonen,et al.  Boyer-Moore Approach to Approximate String Matching (Extended Abstract) , 1990, SWAT.

[17]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[18]  Peter H. Sellers,et al.  The Theory and Computation of Evolutionary Distances: Pattern Recognition , 1980, J. Algorithms.

[19]  Esko Ukkonen,et al.  Two Algorithms for Approximate String Matching in Static Texts , 1991, MFCS.

[20]  Tadao Takaoka,et al.  Approximate Pattern Matching with Samples , 1994, ISAAC.

[21]  Erkki Sutinen,et al.  On Using q-Gram Locations in Approximate String Matching , 1995, ESA.

[22]  Jordan Lampe,et al.  Theoretical and Empirical Comparisons of Approximate String Matching Algorithms , 1992, CPM.

[23]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[24]  Daniel Sunday,et al.  A very fast substring search algorithm , 1990, CACM.

[25]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.