Speeding Up Pattern Matching by Text Sampling

We introduce a novel alphabet sampling technique for speeding up both online and indexed string matching. We choose a subset of the alphabet and select the corresponding subsequence of the text. Online or indexed searching is then carried out on that subsequence, and candidate matches are verified in the full text. We show that this speeds up online searching, especially for moderate to long patterns, by a factor of up to 5. For indexed searching we achieve indexes that are as fast as the classical suffix array, yet occupy space less than 0.5 times the text size (instead of 4) plus text. Our experiments show no competitive alternatives in a wide space/time range.

[1]  Jorma Tarhio,et al.  String Matching with Stopper Encoding and Code Splitting , 2002, CPM.

[2]  Rodrigo González,et al.  Compressed Text Indexes with Fast Locate , 2007, CPM.

[3]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[4]  Juha Kärkkäinen,et al.  Sparse Suffix Trees , 1996, COCOON.

[5]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[6]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[7]  Ricardo A. Baeza-Yates,et al.  Fast and flexible word searching on compressed text , 2000, TOIS.

[8]  Niklaus Wirth,et al.  Algorithms and Data Structures , 1989, Lecture Notes in Computer Science.

[9]  R. Nigel Horspool,et al.  Practical fast searching in strings , 1980, Softw. Pract. Exp..

[10]  Johann van der Merwe,et al.  A survey on peer-to-peer key management for mobile ad hoc networks , 2007, CSUR.

[11]  Gonzalo Navarro,et al.  Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences , 2002 .

[12]  Johannes Fischer,et al.  Suffix Arrays on Words , 2007, CPM.

[13]  Wojciech Plandowski,et al.  Speeding up two string-matching algorithms , 2005, Algorithmica.

[14]  Rodrigo González,et al.  Compressed text indexes: From theory to practice , 2007, JEAL.

[15]  Ricardo A. Baeza-Yates,et al.  String Searching Algorithms Revisited , 1989, WADS.

[16]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..