论文信息 - Approximate Multiple Strings Search

Approximate Multiple Strings Search

This paper presents a fast algorithm for searching a large text for multiple strings allowing one error. On a fast workstation, the algorithm can process a megabyte of text searching for 1000 patterns (with one error) in less than a second. Although we combine several interesting techniques, overall the algorithm is not deep theoretically. The emphasis of this paper is on the experimental side of algorithm design. We show the importance of careful design, experimentation, and utilization of current architectures. In particular, we discuss the issues of locality and cache performance, fast hash functions, and incremental hashing techniques. We introduce the notion of two-level hashing, which utilizes cache behavior to speed up hashing, especially in cases where unsuccessful searches are not uncommon. Two-level hashing may be useful for many other applications. The end result is also interesting by itself. We show that multiple search with one error is fast enough for most text applications.

Udi Manber | Robert Muth | U. Manber | R. Muth

[1] Donald E. Knuth,et al. Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[2] Udi Manber,et al. An Algorithm for Approximate Membership checking with Application to Password Security , 1994, Inf. Process. Lett..

[3] Aviezri S. Fraenkel,et al. A hash code method for detecting and correcting spelling errors , 1982, CACM.

[4] Udi Manber,et al. Fast text searching: allowing errors , 1992, CACM.

[5] Beate Commentz-Walter,et al. A String Matching Algorithm Fast on the Average , 1979, ICALP.

[6] Bowen Alpern,et al. A model for hierarchical memory , 1987, STOC.