论文信息 - A Fast Pattern Matching Algorithm for Biological Sequences

A Fast Pattern Matching Algorithm for Biological Sequences

With the remarkable increase in the number of DNA and proteins sequences, it is more important for the study of pattern matching in querying sequence patterns in the biological sequence database. To further raise the performance of the pattern matching algorithm, a fast exact algorithm (called ZTBMH), which is a variation of Zhu-Takaoka algorithm, is presented. It absorbs the idea of Boyer-Moore-Horspool algorithm, which utilizes only bad character heuristic and reduces the number of comparisons, thus improves the performance in practice. The best, worst and average cases in time complexities of the new algorithm are also discussed in this paper. The experimental results show that the proposed algorithm works better than other compared algorithms, especially in case of small alphabets such as nucleotides sequences, and thus the proposed algorithm is quite applicable for exact pattern matching in biological sequences.

Guoyong Cai | Xuezeng Pan | Yong Huang | Yunjun Gao

[1] Tadao Takaoka,et al. On improving the average case of the Boyer-Moore string matching algorithm , 1988 .

[2] Rahul Thathoo,et al. TVSBS: a fast exact pattern matching algorithm for biological sequences , 2006 .

[3] N. Balakrishnan,et al. A FAST Pattern Matching Algorithm , 2004, J. Chem. Inf. Model..

[4] R. Nigel Horspool,et al. Practical fast searching in strings , 1980, Softw. Pract. Exp..

[5] Thierry Lecroq,et al. Experimental results on string matching algorithms , 1995, Softw. Pract. Exp..

[6] Domenico Cantone,et al. Fast-Search: A New Efficient Variant of the Boyer-Moore String Matching Algorithm , 2003, WEA.

[7] Robert S. Boyer,et al. A fast string searching algorithm , 1977, CACM.

[8] Daniel Sunday,et al. A very fast substring search algorithm , 1990, CACM.

[9] Thierry Lecroq,et al. Handbook of Exact String Matching Algorithms , 2004 .