ClawHMMER: A Streaming HMMer-Search Implementatio

The proliferation of biological sequence data has motivated the need for an extremely fast probabilistic sequence search. One method for performing this search involves evaluating the Viterbi probability of a hidden Markov model (HMM) of a desired sequence family for each sequence in a protein database. However, one of the difficulties with current implementations is the time required to search large databases. Many current and upcoming architectures offering large amounts of compute power are designed with data-parallel execution and streaming in mind. We present a streaming algorithm for evaluating an HMM’s Viterbi probability and refine it for the specific HMM used in biological sequence search. We implement our streaming algorithm in the Brook language, allowing us to execute the algorithm on graphics processors. We demonstrate that this streaming algorithm on graphics processors can outperform available CPU implementations. We also demonstrate this implementation running on a 16 node graphics cluster.

[1]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[2]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[5]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[6]  N. D. Clarke,et al.  Zinc fingers in Caenorhabditis elegans: finding families and probing pathways. , 1998, Science.

[7]  R. Losick,et al.  The transcriptional profile of early to middle sporulation in Bacillus subtilis. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Erik Lindholm,et al.  A user-programmable vertex engine , 2001, SIGGRAPH.

[9]  A. Rosenthal,et al.  The Spin/Ssty repeat: a new motif identified in proteins involved in vertebrate development from gamete to embryo , 2001, Genome Biology.

[10]  D. Bhaya,et al.  Analysis of the hli gene family in marine and freshwater cyanobacteria. , 2002, FEMS microbiology letters.

[11]  William J. Dally,et al.  The Imagine Stream Processor , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[12]  Jung Ho Ahn,et al.  Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[13]  Alfonso Valencia,et al.  SPOC: A widely distributed domain associated with cancer, apoptosis and transcription , 2004, BMC Bioinformatics.

[14]  Pat Hanrahan,et al.  Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.

[15]  Chittibabu Guda,et al.  SledgeHMMER: a web server for batch searching the Pfam database , 2004, Nucleic Acids Res..

[16]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[17]  S.H. Dhong,et al.  A 4.8GHz fully pipelined embedded SRAM in the streaming processor of a CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[18]  Pradeep Dubey,et al.  Platform 2015: Intel ® Processor and Platform Evolution for the Next Decade , 2005 .

[19]  Matt Pharr,et al.  Gpu gems 2: programming techniques for high-performance graphics and general-purpose computation , 2005 .

[20]  B. Flachs,et al.  A streaming processing unit for a CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..