New and faster filters for multiple approximate string matching

We present three new algorithms for on-line multiple string matching allowing errors. These are extensions of previous algorithms that search for a single pattern. The average running time achieved is in all cases linear in the text size for moderate error level, pattern length, and number of patterns. They adapt (with higher costs) to the other cases. However, the algorithms differ in speed and thresholds of usefulness. We theoretically analyze when each algorithm should be used, and show their performance experimentally. The only previous solution for this problem allows only one error. Our algorithms are the first to allow more errors, and are faster than previous work for a moderate number of patterns (e.g. less than 50-100 on English text, depending on the pattern length).

[1]  Fabrizio Luccio,et al.  Simple and Efficient String Matching with k Mismatches , 1989, Inf. Process. Lett..

[2]  Ricardo A. Baeza-Yates,et al.  Text-Retrieval: Theory and Practice , 1992, IFIP Congress.

[3]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[4]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[5]  Udi Manber,et al.  Approximate Multiple Strings Search , 1996, CPM.

[6]  Ricardo A. Baeza-Yates,et al.  Very Fast and Simple Approximate String Matching , 1999, Inf. Process. Lett..

[7]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[8]  Esko Ukkonen,et al.  A Comparison of Approximate String Matching Algorithms , 1996, Softw. Pract. Exp..

[9]  Ricardo A. Baeza-Yates,et al.  Fast and Practical Approximate String Matching , 1992, Inf. Process. Lett..

[10]  F. Frances Yao,et al.  Multi-index hashing for information retrieval , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[11]  Gonzalo Navarro,et al.  Improving an Algorithm for Approximate Pattern Matching , 2001, Algorithmica.

[12]  U. Manber,et al.  APPROXIMATE MULTIPLE STRING SEARCH , 1996 .

[13]  Gonzalo Navarro,et al.  Multiple Approximate String Matching , 1997, WADS.

[14]  Erkki Sutinen,et al.  On Using q-Gram Locations in Approximate String Matching , 1995, ESA.

[15]  Gaston H. Gonnet,et al.  A new approach to text searching , 1989, SIGIR '89.

[16]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[17]  Alden H. Wright Approximate string matching using withinword parallelism , 1994, Softw. Pract. Exp..

[18]  Jordan Lampe,et al.  Theoretical and Empirical Comparisons of Approximate String Matching Algorithms , 1992, CPM.

[19]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[20]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[21]  Gonzalo Navarro,et al.  Faster Approximate String Matching , 1999, Algorithmica.

[22]  Eugene W. Myers,et al.  A Subquadratic Algorithm for Approximate Regular Expression Matching , 1995, J. Algorithms.

[23]  Peter H. Sellers,et al.  The Theory and Computation of Evolutionary Distances: Pattern Recognition , 1980, J. Algorithms.

[24]  Udi Manber,et al.  A Sub-quadratic Algorithm for Approximate Limited Expression Matching 1 , 1992 .

[25]  Esko Ukkonen,et al.  Approximate Boyer-Moore String Matching , 1993, SIAM J. Comput..

[26]  Esko Ukkonen,et al.  Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[27]  Esko Ukkonen,et al.  A Comparison of Approximate String Matching Algorithms , 1996 .

[28]  Ricardo Baeza-Yates,et al.  Efficient text searching , 1989 .

[29]  Eugene W. Myers,et al.  A fast bit-vector algorithm for approximate string matching based on dynamic programming , 1998, JACM.

[30]  Gonzalo Navarro,et al.  A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching , 1998, CPM.

[31]  Daniel Sunday,et al.  A very fast substring search algorithm , 1990, CACM.

[32]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[33]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[34]  Zvi Galil,et al.  An Improved Algorithm for Approximate String Matching , 1990, SIAM J. Comput..

[35]  Eugene W. Myers A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming , 1998, CPM.