Integrating FPGA acceleration into HMMer

HMMer is a commonly used package for biological sequence database searching with profile hidden Markov model (HMMs). It allows researchers to compare HMMs to sequence databases or sequences to HMM databases. However, such searches often take many hours on traditional computer architectures. These runtime requirements are likely to become even more severe due to the rapid growth in size of both sequence and model databases. We present a new reconfigurable architecture to accelerate the two HMMer database search procedures hmmsearch and hmmpfam. It is described how this leads to significant runtime savings on off-the-shelf field-programmable gate arrays (FPGAs).

[1]  Patrick Crowley,et al.  Exploiting coarse-grained parallelism to accelerate protein motif finding with a network processor , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[2]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[3]  Bertil Schmidt,et al.  Accelerating the Viterbi Algorithm for Profile Hidden Markov Models Using Reconfigurable Hardware , 2006, International Conference on Computational Science.

[4]  Chittibabu Guda,et al.  SledgeHMMER: a web server for batch searching the Pfam database , 2004, Nucleic Acids Res..

[5]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[6]  Benjamin J. Raphael,et al.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.

[7]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementatio , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[8]  Bashar Qudah,et al.  Accelerating the HMMER sequence analysis suite using conventional processors , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[9]  Guang R. Gao,et al.  Implementing parallel hmm-pfam on the EARTH multithreaded architecture , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[10]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[11]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[12]  Eric Rice,et al.  The UCSC Kestrel parallel processor , 2005, IEEE Transactions on Parallel and Distributed Systems.

[13]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[14]  Brandon Harris,et al.  Accelerator design for protein sequence HMM search , 2006, ICS '06.

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[17]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[18]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.