BLIM: A New Bit-Parallel Pattern Matching Algorithm Overcoming Computer Word Size Limitation

Bitwise operations are executed very fast in computer architecture. Algorithms aiming to benefit from this intrinsic property can be classified as bit-parallel algorithms. Bit-parallelism has been widely investigated in the pattern matching area since the introduction of the Shift-Or algorithm. In the original idea, there is no shift mechanism, and the input pattern length is required to be less than the computer word size (W) to benefit from the full power of bit-parallelism. The lack of the shift mechanism was removed by the succeeding algorithms of this genre, but W limitation has not been overcome in an elegant way. This study proposes a new bit-parallel algorithm, given name BLIM (bit-parallel length independent matching), for exact pattern matching that does not restrict the input pattern to be shorter than the word size. The multiple pattern case is also addressed, and it is shown that up to computer word size number of patterns, whatever their lengths are, can be searched simultaneously in a single bit-parallel framework. Similar to other algorithms of this genre, BLIM is also capable of handling fixed-length gaps and character classes in the input strings as well. The proposed algorithm is compared with the other alternatives of its class, mainly the shift-or and BNDM variants. Experimental results indicate that BLIM is compatible with the previous bit-parallel algorithms with an additional gain of overcoming the word size limitation.

[1]  Thierry Lecroq,et al.  Handbook of Exact String Matching Algorithms , 2004 .

[2]  Jorma Tarhio,et al.  Tuning BNDM with q-Grams , 2009, ALENEX.

[3]  Z. Galil,et al.  Pattern matching algorithms , 1997 .

[4]  Gonzalo Navarro,et al.  Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences , 2002 .

[5]  Gonzalo Navarro,et al.  Fast and flexible string matching by combining bit-parallelism and suffix automata , 2000, JEAL.

[6]  Kimmo Fredriksson,et al.  Faster String Matching with Super-Alphabets , 2002, SPIRE.

[7]  Jorma Tarhio,et al.  Alternative Algorithms for Bit-Parallel String Matching , 2003, SPIRE.

[8]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[9]  Szymon Grabowski,et al.  Average-optimal string matching , 2009, J. Discrete Algorithms.

[10]  Vineet Bafna,et al.  Pattern Matching Algorithms , 1997 .

[11]  Bruce W. Watson A new family of Commentz-Walter-style multiple-keyword pattern matching algorithms , 2000, Stringology.

[12]  Gaston H. Gonnet,et al.  A new approach to text searching , 1989, SIGIR '89.

[13]  M. Oguzhan Külekci A Method to Overcome Computer Word Size Limitation in Bit-Parallel Pattern Matching , 2008, ISAAC.

[14]  Gerard Zwaan,et al.  A new taxonomy of sublinear keyword pattern matching algorithms , 2004 .

[15]  Wojciech Rytter,et al.  Jewels of stringology , 2002 .

[16]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[17]  Daniel Sunday,et al.  A very fast substring search algorithm , 1990, CACM.

[18]  Szymon Grabowski,et al.  Practical and Optimal String Matching , 2005, SPIRE.

[19]  Bruce W. Watson,et al.  SPARE Parts: a C++ toolkit for string pattern recognition , 2004, Softw. Pract. Exp..