Dictionary Loop-Up with Small Errors

Let W be a set of n binary strings of length m each. We are interested in designing data structures for W that can answer d-queries quickly, that is, given a binary string α, decide whether there is any member of W within Hamming distance d of α. This problem, originally raised by Minsky and Papert [MP], remains a challenge in data structure design. In this paper, we make an initial effort towards a theoretical study of the small d case. Our main result is a data structure that achieves O(m log log n) query time with O(nm log m) space for the d = 1 case.

[1]  Noam Nisan,et al.  Neighborhood preserving hashing and approximate queries , 1994, SODA '94.

[2]  Danny Dolev,et al.  Finding the neighborhood of a query in a dictionary , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[3]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[4]  Udi Manber,et al.  An Algorithm for Approximate Membership checking with Application to Password Security , 1994, Inf. Process. Lett..

[5]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[6]  Uzi Vishkin,et al.  Fast String Matching with k Differences , 1988, J. Comput. Syst. Sci..

[7]  F. Frances Yao,et al.  Multi-index hashing for information retrieval , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[8]  Peter Elias,et al.  Efficient Storage and Retrieval by Content and Address of Static Files , 1974, JACM.

[9]  Zvi Galil,et al.  An Improved Algorithm for Approximate String Matching , 1989, SIAM J. Comput..

[10]  Esko Ukkonen,et al.  Approximate Boyer-Moore String Matching , 1993, SIAM J. Comput..

[11]  Andrew Chi-Chih Yao,et al.  Should Tables Be Sorted? , 1981, JACM.