The Exact String Matching Problem: a Comprehensive Experimental Evaluation

This paper addresses the online exact string matching prob- lem which consists in finding all occurrences of a given pattern p in a text t. It is an extensively studied problem in computer science, mainly due to its direct applications to such diverse areas as text, image and signal pro- cessing, speech analysis and recognition, data compression, information retrieval, computational biology and chemistry. Since 1970 more than 80 string matching algorithms have been proposed, and more than 50% of them in the last ten years. In this note we present a comprehensive list of all string matching algorithms and present experimental results in order to compare them from a practical point of view. ?From our experimental evaluation it turns out that the performance of the algorithms are quite different for different alphabet sizes and pattern length.

[1]  Binxing Fang,et al.  The wide window string matching algorithm , 2005, Theor. Comput. Sci..

[2]  Tadao Takaoka,et al.  On improving the average case of the Boyer-Moore string matching algorithm , 1988 .

[3]  Thierry Lecroq A Variation on the Boyer-Moore Algorithm , 1992, Theor. Comput. Sci..

[4]  Domenico Cantone,et al.  Bit-(Parallelism)2: Getting to the Next Level of Parallelism , 2010, FUN.

[5]  Raffaele Giancarlo,et al.  On the Exact Complexity of String Matching: Upper Bounds , 1992, SIAM J. Comput..

[6]  Rahul Thathoo,et al.  TVSBS: a fast exact pattern matching algorithm for biological sequences , 2006 .

[7]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[8]  Wojciech Plandowski,et al.  Speeding up two string-matching algorithms , 2005, Algorithmica.

[9]  C. Hancart Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte , 1993 .

[10]  M. Oguzhan Külekci A Method to Overcome Computer Word Size Limitation in Bit-Parallel Pattern Matching , 2008, ISAAC.

[11]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[12]  Livio Colussi,et al.  Fastest Pattern Matching in Strings , 1994, J. Algorithms.

[13]  Andrew Hume,et al.  Fast string searching , 1991, USENIX Summer.

[14]  Zvi Galil,et al.  Time-Space-Optimal String Matching , 1983, J. Comput. Syst. Sci..

[15]  Maxime Crochemore,et al.  Factor Oracle: A New Structure for Pattern Matching , 1999, SOFSEM.

[16]  Gonzalo Navarro,et al.  Fast and flexible string matching by combining bit-parallelism and suffix automata , 2000, JEAL.

[17]  Maxime Crochemore,et al.  Two-way string-matching , 1991, JACM.

[18]  Gonzalo Navarro,et al.  NR‐grep: a fast and flexible pattern‐matching tool , 2001, Softw. Pract. Exp..

[19]  Timo Raita,et al.  Tuning the boyer‐moore‐horspool string searching algorithm , 1992, Softw. Pract. Exp..

[20]  Domenico Cantone,et al.  A compact representation of nondeterministic (suffix) automata for the bit-parallel approach , 2012, Inf. Comput..

[21]  Jorma Tarhio,et al.  Alternative Algorithms for Bit-Parallel String Matching , 2003, SPIRE.

[22]  Markus E. Nebel Fast string matching by using probabilities: On an optimal mismatch variant of Horspool's algorithm , 2006, Theor. Comput. Sci..

[23]  Thomas Berry,et al.  A Fast String Matching Algorithm and Experimental Results , 1999, Stringology.

[24]  Haifeng Ma,et al.  Fast Variants of the Backward-Oracle-Marching Algorithm , 2009, 2009 Fourth International Conference on Internet Computing for Science and Engineering.

[25]  Rezaul Alam Chowdhury,et al.  A New String Matching Algorithm , 2003, Int. J. Comput. Math..

[26]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[27]  Thierry Lecroq,et al.  Efficient Variants of the Backward-Oracle-Matching Algorithm , 2008, Stringology.

[28]  Maxime Crochemore,et al.  A fast implementation of the Boyer-Moore string matching algorithm , 2007 .

[29]  Maxime Crochemore,et al.  Optimal Canonization of All Substrings of a String , 1991, Inf. Comput..

[30]  Andrew Chi-Chih Yao,et al.  The Complexity of Pattern Matching for a Random String , 1977, SIAM J. Comput..

[31]  Dima Suleiman,et al.  A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW) , 2008 .

[32]  Kimmo Fredriksson,et al.  Succinct backward-DAWG-matching , 2009, JEAL.

[33]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[34]  Gonzalo Navarro,et al.  A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching , 1998, CPM.

[35]  Szymon Grabowski,et al.  Practical and Optimal String Matching , 2005, SPIRE.

[36]  Derong Liu,et al.  Two Improved Single Pattern Matching Algorithms , 2006, 16th International Conference on Artificial Reality and Telexistence--Workshops (ICAT'06).

[37]  David Haussler,et al.  Linear size finite automata for the set of all subwords of a word - an outline of results , 1983, Bull. EATCS.

[38]  P. D. Smith Experiments with a very fast substring search algorithm , 1991, Softw. Pract. Exp..

[39]  Paulo Carvalho,et al.  GRASPm: an efficient algorithm for exact pattern-matching in genomic sequences , 2009, Int. J. Bioinform. Res. Appl..

[40]  Daniel Sunday,et al.  A very fast substring search algorithm , 1990, CACM.

[41]  Maxime Crochemore Optimal Factor Transducers , 1985 .

[42]  Raffaele Giancarlo,et al.  On the Exact Complexity of String Matching: Lower Bounds , 1991, SIAM J. Comput..

[43]  N. Balakrishnan,et al.  A FAST Pattern Matching Algorithm , 2004, J. Chem. Inf. Model..

[44]  Jorma Tarhio,et al.  Comparison of Exact String Matching Algorithms for Biological Sequences , 2008, BIRD.

[45]  David Haussler,et al.  The Smallest Automaton Recognizing the Subwords of a Text , 1985, Theor. Comput. Sci..

[46]  Livio Colussi Correctness and Efficiency of the Pattern Matching Algorithms , 1991, Inf. Comput..

[47]  Gaston H. Gonnet,et al.  A new approach to text searching , 1992, CACM.

[48]  Gaston H. Gonnet,et al.  A new approach to text searching , 1989, SIGIR '89.

[49]  R. Nigel Horspool,et al.  Practical fast searching in strings , 1980, Softw. Pract. Exp..

[50]  Guomin Zhang,et al.  A Bit-Parallel Exact String Matching Algorithm for Small Alphabet , 2009, FAW.

[51]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[52]  Domenico Cantone,et al.  Fast-Search: A New Efficient Variant of the Boyer-Moore String Matching Algorithm , 2003, WEA.

[53]  Frantisek Franek,et al.  A simple fast hybrid pattern-matching algorithm , 2007, J. Discrete Algorithms.

[54]  Cyril Allauzen,et al.  Simple Optimal String Matching Algorithm , 2000, CPM.

[55]  Domenico Cantone,et al.  Fast-Search Algorithms: New Efficient Variants of the Boyer-Moore Pattern-Matching Algorithm , 2005, J. Autom. Lang. Comb..

[56]  Imre Simon String Matching Algorithms and Automata , 1994, Results and Trends in Theoretical Computer Science.

[57]  Thierry Lecroq,et al.  Fast exact string matching algorithms , 2007, Inf. Process. Lett..

[58]  M. Oguzhan Külekci Filter Based Fast Matching of Long Patterns by Using SIMD Instructions , 2009, Stringology.

[59]  Raffaele Giancarlo,et al.  The Boyer-Moore-Galil String Searching Strategies Revisited , 1986, SIAM J. Comput..

[60]  Ricardo A. Baeza-Yates,et al.  Average Running Time of the Boyer-Moore-Horspool Algorithm , 1992, Theor. Comput. Sci..

[61]  Maxime Crochemore String-Matching on Ordered Alphabets , 1992, Theor. Comput. Sci..