Enhancing N-Gram-Hirschberg Algorithm by Using Hash Function

Dynamic programming-based algorithm such as Smith-Waterman algorithm, which produces the most optimal result, has been known as one of the most used algorithm for sequence alignment. Hirschberg algorithm is the space saving version of Smith-Waterman algorithm. However, both algorithms are still very computational intensive. The N-Gram-Hirschberg algorithm is introduced to further reduced the space requirement and at the same time, to speed up the sequences alignment algorithm. This research aims to enhance the N-Gram-Hirschberg algorithm by embedding the Hashing function, adopted from an exact string matching algorithm called Karp-Rabin. The hash function is used to enhance the transformation process for the algorithm. The new method improves the processing time of the N-Gram-Hirschberg without sacrificing the quality of the output. The best time enhancement we got was when word length is two for protein sequence length ranges between 100-1000.

[1]  Azzedine Boukerche,et al.  Parallel Strategies for Local Biological Sequence Alignment in a Cluster of Workstations , 2005, IPDPS.

[2]  Bogdan Warinschi,et al.  A linear space algorithm for computing the hermite normal form , 2001, ISSAC '01.

[3]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[4]  Yang Liu,et al.  GPU Accelerated Smith-Waterman , 2006, International Conference on Computational Science.

[5]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[6]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Farhan Ahmed,et al.  Pruning algorithm to reduce the search space of the Smith-Waterman algorithm & Kernel extensions to the µC/OS-II Real-Time Operating System , 2005 .

[8]  W. Pearson Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.

[9]  Azzedine Boukerche,et al.  Parallel strategies for local biological sequence alignment in a cluster of workstations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[10]  Olumide Owolabi Empirical studies of some hashing functions , 2003, Inf. Softw. Technol..

[11]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.