Text Indexing and Dictionary Matching with One Error

The indexing problem is where a text is preprocessed and subsequent queries of the form “Find all occurrences of pattern P in the text” are answered in time proportional to the length of the query and the number of occurrences. In the dictionary matching problem a set of patterns is preprocessed and subsequent queries of the form “Find all occurrences of dictionary patterns in text T” are answered in time proportional to the length of the text and the number of occurrences.There exist efficient worst-case solutions for the indexing problem and the dictionary matching problem, but none that find approximate occurrences of the patterns, i.e., where the pattern is within a bound edit (or Hamming) distance from the appropriate text location.In this paper we present a uniform deterministic solution to both the indexing and the general dictionary matching problem with one error. We preprocess the data in time O(nlog2n), where n is the text size in the indexing problem and the dictionary size in the dictionary matching problem. Our query time for the indexing problem is O(mlognloglogn+tocc), where m is the query string size and tocc is the number of occurrences. Our query time for the dictionary matching problem is O(nlog3dloglogd+tocc), where n is the text size and d the dictionary size. The time bounds above apply to both bounded and unbounded alphabets.

[1]  Roberto Grossi,et al.  Fast incremental text editing , 1995, SODA '95.

[2]  Gad M. Landau,et al.  An Efficient Algorithm for the All Pairs Suffix-Prefix Problem , 1992, Inf. Process. Lett..

[3]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[4]  Gad M. Landau,et al.  Indexing and Dictionary Matching with One Error , 1999, WADS.

[5]  J. Seiferas,et al.  Efficient and Elegant Subword-Tree Construction , 1985 .

[6]  Z. Galil,et al.  Pattern matching algorithms , 1997 .

[7]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[8]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[9]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[10]  V AhoAlfred,et al.  Efficient string matching , 1975 .

[11]  Amihood Amir,et al.  Adaptive dictionary matching , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[12]  Roberto Grossi,et al.  Optimal On-Line Search and Sublinear Time Update in String Matching , 1998, SIAM J. Comput..

[13]  Ming Gu,et al.  An efficient algorithm for dynamic text indexing , 1994, SODA '94.

[14]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[15]  Mark H. Overmars Efficient Data Structures for Range Searching on a Grid , 1988, J. Algorithms.

[16]  Andrew Chi-Chih Yao,et al.  Dictionary Look-Up with One Error , 1997, J. Algorithms.

[17]  Mark de Berg,et al.  Multi-method dispatching: a geometric approach with applications to string matching problems , 1999, STOC '99.

[18]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[19]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[20]  Leszek Gasieniec,et al.  Approximate Dictionary Queries , 1996, CPM.

[21]  M. Farach Optimal suffix tree construction with large alphabets , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[22]  Alejandro A. Schäffer,et al.  Improved dynamic dictionary matching , 1995, SODA '93.

[23]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[24]  Raffaele Giancarlo,et al.  Dynamic Dictionary Matching , 1994, J. Comput. Syst. Sci..

[25]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[26]  Uzi Vishkin,et al.  Efficient approximate and dynamic matching of patterns using a labeling paradigm , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[27]  Alejandro A. Schäffer,et al.  Dynamic Dictionary Matching with Failure Functions , 1994, Theor. Comput. Sci..