论文信息 - HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment

HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment

Sequence-based protein function and structure prediction depends crucially on sequence-search sensitivity and accuracy of the resulting sequence alignments. We present an open-source, general-purpose tool that represents both query and database sequences by profile hidden Markov models (HMMs): 'HMM-HMM–based lightning-fast iterative sequence search' (HHblits; http://toolkit.genzentrum.lmu.de/hhblits/). Compared to the sequence-search tool PSI-BLAST, HHblits is faster owing to its discretized-profile prefilter, has 50–100% higher sensitivity and generates more accurate alignments.

A. Biegert | J. Söding | M. Remmert | A. Hauser

[1] T. Blundell,et al. Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[2] A G Murzin,et al. SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[3] Gapped BLAST and PSI-BLAST: A new , 1997 .

[4] Richard Hughey,et al. Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[5] A. Murzin. How far divergent evolution goes in proteins. , 1998, Current opinion in structural biology.

[6] Ian Holmes,et al. Dynamic Programming Alignment Accuracy , 1998, J. Comput. Biol..

[7] D T Jones,et al. Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[8] M. Gerstein,et al. Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-Domain Proteins , 2001, Genome Research.

[9] Poethig Rs,et al. Life with 25,000 genes. , 2001 .

[10] J. Skolnick,et al. TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[11] Johannes Söding,et al. The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[12] Johannes Söding,et al. Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[13] Adam Godzik,et al. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[14] Michael Farrar,et al. Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[15] Tim J. P. Hubbard,et al. Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[16] Johannes Söding,et al. De novo identification of highly diverged protein repeats by probabilistic consistency , 2008, Bioinform..

[17] A. Biegert,et al. Sequence context-specific profiles for homology searching , 2009, Proceedings of the National Academy of Sciences.

[18] Jeff A. Bilmes,et al. Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure , 2011, BMC Bioinformatics.

[19] W. Pearson,et al. Homologous over-extension: a challenge for iterative similarity searches , 2010, Nucleic acids research.

[20] Sven Griep,et al. PDBselect 1992–2009 and PDBfilter-select , 2009, Nucleic Acids Res..

[21] L. Holm,et al. The Pfam protein families database , 2005, Nucleic Acids Res..

[22] Thomas A. Hopf,et al. Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[23] Johannes Söding,et al. Protein sequence comparison and fold recognition: progress and good-practice benchmarking. , 2011, Current opinion in structural biology.