Accelerated Profile HMM Searches

Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.

[1]  Christian Halloy,et al.  HSP-HMMER: a tool for protein domain identification on a large scale , 2009, SAC '09.

[2]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[3]  Feng Liu,et al.  Parallel Implementations of Local Sequence Alignment: Hardware and Software , 2005 .

[4]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[5]  William Noble Grundy,et al.  Homology Detection via Family Pairwise Search , 1998, J. Comput. Biol..

[6]  R. Agarwala,et al.  Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST , 2006, BMC Biology.

[7]  John Paul Walters,et al.  Evaluating the use of GPUs in liver image segmentation and HMMER database searches , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[8]  W R Pearson,et al.  Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.

[9]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[10]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[11]  Michael Kistler,et al.  Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[12]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[13]  Patrice Quinton,et al.  Parallelizing HMMER for Hardware Acceleration on FPGAs , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[14]  Steven Johnson Rob Mitra Tim Schedl Jim Skeath Gar Stormo,et al.  REMOTE PROTEIN HOMOLOGY DETECTION USING HIDDEN MARKOV MODELS , 2006 .

[15]  R. Agarwala,et al.  Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches , 2006, Nucleic acids research.

[16]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[17]  Rahul Pratap Acceleration of Profile-HMM Search for Protein Sequences in Reconfigurable Hardware - Master's Thesis, May 2006 , 2006 .

[18]  Jeremy Buhler,et al.  Designing Patterns and Profiles for Faster HMM Search , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Joseph M. Lancaster,et al.  Preliminary results in accelerating profile HMM search on FPGAs , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[20]  Jonathan P. Bollback,et al.  Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. , 2006, Genome research.

[21]  T. Rognes,et al.  ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches. , 2001, Nucleic acids research.

[22]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[23]  Sean R. Eddy,et al.  A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation , 2008, PLoS Comput. Biol..

[24]  Rahul Pratap Maddimsetty Acceleration of Profile-HMM Search for Protein Sequences in Reconfigurable Hardware - Master's Thesis, May 2006 , 2006 .

[25]  Steven E. Brenner,et al.  Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap , 2005, Bioinform..

[26]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[27]  S. Altschul,et al.  The estimation of statistical parameters for local alignment score distributions. , 2001, Nucleic acids research.

[28]  Chittibabu Guda,et al.  SledgeHMMER: a web server for batch searching the Pfam database , 2004, Nucleic Acids Res..

[29]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[30]  Bertil Schmidt,et al.  High Performance Database Searching with HMMer on FPGAs , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[31]  María Martín,et al.  Ongoing and future developments at the Universal Protein Resource , 2010, Nucleic Acids Res..

[32]  Andrzej Wozniak,et al.  Using video-oriented instructions to speed up sequence comparison , 1997, Comput. Appl. Biosci..

[33]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementation , 2005, SC.

[34]  R. Agarwala,et al.  Protein database searches using compositionally adjusted substitution matrices , 2005, The FEBS journal.

[35]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[36]  Steven F. Quigley,et al.  Implementing log-add algorithm in hardware , 2003 .

[37]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[38]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[39]  M. Madera,et al.  A comparison of profile hidden Markov model procedures for remote homology detection. , 2002, Nucleic acids research.

[40]  Jeremy Buhler,et al.  Designing patterns for profile HMM search , 2007, Bioinform..

[41]  Rahul Pratap Maddimsetty THE HENRY EDWIN SEVER GRADUATE SCHOOL DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ACCELERATION OF PROFILE-HMM SEARCH FOR PROTEIN SEQUENCES IN RECONFIGURABLE HARDWARE , 2011 .

[42]  Aleksandar Milosavljevic,et al.  Discovering simple DNA sequences by the algorithmic significance method , 1993, Comput. Appl. Biosci..

[43]  A. Dupret,et al.  Low Power Motion Detection with Low Spatial and Temporal Resolution for CMOS Image Sensor , 2007, 2006 International Workshop on Computer Architecture for Machine Perception and Sensing.

[44]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[45]  Sean R. Eddy,et al.  Hidden Markov model speed heuristic and iterative HMM search procedure , 2010, BMC Bioinformatics.

[46]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementatio , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[47]  John Paul Walters,et al.  Accelerating HMMer searches on Opteron processors with minimally invasive recoding , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[48]  Bashar Qudah,et al.  Accelerating the HMMER sequence analysis suite using conventional processors , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).