Compressed Pattern Matching in DNA Sequences Using Multithreaded Technology

Compressed pattern matching on large DNA sequences data is very important in bioinformatics. In this paper, in order to improve the performance by searching pattern in parallel time, multithreaded programming technique is used. Then, two novel multithreaded algorithms are proposed, named MTd-BM and MTd-Horspool. The first one is a mutation of d-BM algorithm, which is based on Boyer-Moore method. And the second one is designed in the similitude of MTd-BM, but using Horspool method as its foundation. The experimental results show that these two algorithms are nearly 2 times faster than the d-BM algorithm for long DNA pattern (length>50). Moreover, compression of DNA sequences gives a guaranteed space saving of 75%.

[1]  Jerome L. Paul,et al.  Algorithms: Sequential, Parallel, and Distributed , 2004 .

[2]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[3]  Karen L. Sielski Implementing Ada tasking in a multiprocessing, multithreaded UNIX environment , 1992, TRI-Ada '92.

[4]  Amar Mukherjee,et al.  Multiple Pattern Matching , 2010 .

[5]  Pawel Gepner,et al.  Hyper-Threading Technology Speeds Clusters , 2003, PPAM.

[6]  Xin Chen,et al.  A compression algorithm for DNA sequences and its applications in genome comparison , 2000, RECOMB '00.

[7]  Ayumi Shinohara,et al.  Multiple pattern matching in LZW compressed text , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[8]  Donald A. Adjeroh,et al.  Searching BWT compressed text with the Boyer-Moore algorithm and binary search , 2002, Proceedings DCC 2002. Data Compression Conference.

[9]  Graham A. Stephen String Searching Algorithms , 1994, Lecture Notes Series on Computing.

[10]  Siu-Ming Yiu,et al.  Approximate string matching in DNA sequences , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[11]  Tao Tao,et al.  Pattern matching in LZW compressed files , 2005, IEEE Transactions on Computers.

[12]  Shiyong Lu,et al.  Compressed pattern matching in DNA sequences , 2004 .

[13]  Yong Zhang,et al.  DNA sequence compression using the Burrows-Wheeler Transform , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[14]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[15]  Stéphane Grumbach,et al.  A New Challenge for Compression Algorithms: Genetic Sequences , 1994, Inf. Process. Manag..

[16]  Jianhua Zhao,et al.  A case study for monitoring-oriented programming in multi-core architecture , 2008, IWMSE '08.