Practical and Optimal String Matching

We develop a new exact bit-parallel string matching algorithm, based on the Shift-Or algorithm (Baeza-Yates & Gonnet, 1992). Assuming that the pattern representation fits into a single computer word, this algorithm has optimal O(n logσm / m) average running time, as well as optimal O(n) worst case running time, where n, m and σ are the sizes of the text, the pattern, and the alphabet, respectively. We also study several implementation details. The experimental results show that our algorithm is the fastest in most of the cases where it can be applied, displacing even the long-standing BNDM (Navarro & Raffinot, 2000) family of algorithms. Finally, we show how to adapt our techniques for the Shift-Add algorithm (Baeza-Yates & Gonnet, 1992), obtaining optimal time for searching under Hamming distance.

[1]  Gonzalo Navarro,et al.  NR‐grep: a fast and flexible pattern‐matching tool , 2001, Softw. Pract. Exp..

[2]  Tadao Takaoka,et al.  Approximate Pattern Matching with Samples , 1994, ISAAC.

[3]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[4]  Jorma Tarhio,et al.  Alternative Algorithms for Bit-Parallel String Matching , 2003, SPIRE.

[5]  Gonzalo Navarro,et al.  Fast and flexible string matching by combining bit-parallelism and suffix automata , 2000, JEAL.

[6]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[7]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[8]  R. Nigel Horspool,et al.  Practical fast searching in strings , 1980, Softw. Pract. Exp..

[9]  Gaston H. Gonnet,et al.  A new approach to text searching , 1992, CACM.

[10]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[11]  Paul G. Spirakis,et al.  Algorithms — ESA '95 , 1995, Lecture Notes in Computer Science.

[12]  Gonzalo Navarro,et al.  Bit-parallel (delta, gamma)-matching and suffix automata , 2005, J. Discrete Algorithms.

[13]  Andrew Chi-Chih Yao,et al.  The Complexity of Pattern Matching for a Random String , 1977, SIAM J. Comput..

[14]  Daniel Sunday,et al.  A very fast substring search algorithm , 1990, CACM.

[15]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[16]  Erkki Sutinen,et al.  On Using q-Gram Locations in Approximate String Matching , 1995, ESA.

[17]  Binxing Fang,et al.  Linear Nondeterministic Dawg String Matching Algorithm , 2004, SPIRE.