Fast string matching for DNA sequences

Abstract In this paper we propose the Maximal Average Shift (MAS) algorithm that finds a pattern scan order that maximizes the average shift length. We also present two extensions of MAS: one improves the scan speed of MAS by using the scan result of the previous window, and the other improves the running time of MAS by using q-grams. These algorithms show better average performances in scan speed than previous string matching algorithms for DNA sequences.

[1]  M. Oguzhan Külekci An Empirical Analysis of Pattern Scan Order in Pattern Matching , 2007, World Congress on Engineering.

[2]  Ricardo A. Baeza-Yates,et al.  Average Running Time of the Boyer-Moore-Horspool Algorithm , 1992, Theor. Comput. Sci..

[3]  Livio Colussi Correctness and Efficiency of the Pattern Matching Algorithms , 1991, Inf. Comput..

[4]  Mireille Régnier,et al.  Analysis of Boyer-Moore-type string searching algorithms , 1990, SODA '90.

[5]  Thierry Lecroq,et al.  Fast exact string matching algorithms , 2007, Inf. Process. Lett..

[6]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[7]  Raffaele Giancarlo,et al.  On the Exact Complexity of String Matching: Lower Bounds , 1991, SIAM J. Comput..

[8]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[9]  Maxime Crochemore,et al.  Factor Oracle: A New Structure for Pattern Matching , 1999, SOFSEM.

[10]  Thierry Lecroq,et al.  Handbook of Exact String Matching Algorithms , 2004 .

[11]  Kunsoo Park,et al.  Improved pattern-scan-order algorithms for string matching , 2018, J. Discrete Algorithms.

[12]  Jorma Tarhio,et al.  Alternative Algorithms for Bit-Parallel String Matching , 2003, SPIRE.

[13]  Thierry Lecroq,et al.  The String Matching Algorithms Research Tool , 2016, Stringology.

[14]  Wojciech Plandowski,et al.  Speeding up two string-matching algorithms , 2005, Algorithmica.

[15]  Thierry Lecroq,et al.  The exact online string matching problem: A review of the most recent results , 2013, CSUR.

[16]  Gerard Zwaan,et al.  A taxonomy of keyword pattern matching algorithms , 1992 .

[17]  Andrew Hume,et al.  Fast string searching , 1991, USENIX Summer.

[18]  Sven Rahmann,et al.  Exact Analysis of Horspool's and Sunday's Pattern Matching Algorithms with Probabilistic Arithmetic Automata , 2010, LATA.

[19]  M. Yamagishi,et al.  Nucleotide Frequencies in Human Genome and Fibonacci Numbers , 2006, Bulletin of mathematical biology.

[20]  Daniel Sunday,et al.  A very fast substring search algorithm , 1990, CACM.

[21]  Thierry Lecroq,et al.  Efficient Variants of the Backward-Oracle-Matching Algorithm , 2008, Stringology.

[22]  Richard Cole,et al.  Tighter Lower Bounds on the Exact Complexity of String Matching , 1995, SIAM J. Comput..

[23]  Arthur Gittleman Predicting string search speed , 1996 .

[24]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[25]  Gilles Didier,et al.  Designing optimal- and fast-on-average pattern matching algorithms , 2016, J. Discrete Algorithms.

[26]  R. Nigel Horspool,et al.  Practical fast searching in strings , 1980, Softw. Pract. Exp..

[27]  Thierry Lecroq,et al.  Experimental results on string matching algorithms , 1995, Softw. Pract. Exp..