论文信息 - 2001-2010: Ten Years of Exact String Matching Algorithms

2001-2010: Ten Years of Exact String Matching Algorithms

The online exact string matching problem consists in finding all occurrences of a given pattern p in a text t. It is an extensively studied problem in computer science, mainly due to its direct applications to such diverse areas as text, image and signal processing, speech analysis and recognition, information retrieval, data compression, computational biology and chemistry. In the last decade more than 50 new algorithms have been proposed for the problem, which add up to a wide set of (almost 40) algorithms presented before 2000 [1]. We will review the most efficient string matching algorithms presented in the last decade in order to bring order among the dozens of articles published in this area. We performed comparisons between 85 exact string matching algorithms with 12 texts of different types [4]. We divide the patterns into four classes according to their length m: very short (m ≤ 4), short (4 256). We proceed in the same way for the alphabets according to their size σ: very small (σ 128). According to our experimental results (see Figure 1), we conclude that the following algorithms are the most efficient in the following situations:

Thierry Lecroq | Simone Faro

[1] Udi Manber,et al. Fast text searching: allowing errors , 1992, CACM.

[2] Thierry Lecroq,et al. Handbook of Exact String Matching Algorithms , 2004 .

[3] Jorma Tarhio,et al. Alternative Algorithms for Bit-Parallel String Matching , 2003, SPIRE.

[4] Thierry Lecroq,et al. Efficient Variants of the Backward-Oracle-Matching Algorithm , 2008, Stringology.

[5] Frantisek Franek,et al. A simple fast hybrid pattern-matching algorithm , 2007, J. Discrete Algorithms.

[6] Thierry Lecroq,et al. The exact online string matching problem: A review of the most recent results , 2013, CSUR.

[7] Thierry Lecroq,et al. The Exact String Matching Problem: a Comprehensive Experimental Evaluation , 2010, ArXiv.

[8] Thierry Lecroq,et al. Fast exact string matching algorithms , 2007, Inf. Process. Lett..

[9] M. Oguzhan Külekci. Filter Based Fast Matching of Long Patterns by Using SIMD Instructions , 2009, Stringology.

[10] Rahul Thathoo,et al. TVSBS: a fast exact pattern matching algorithm for biological sequences , 2006 .