Protein homology detection by HMM?CHMM comparison

MOTIVATION Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. RESULTS We have generalized the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all comparison of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%.Sensitivity: When the predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approximately half of the improvement over the profile-profile comparison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an increased alignment quality. HHsearch produced 1.2, 1.7 and 3.3 times more good alignments ('balanced' score >0.3) than the next best method (COMPASS), and 1.6, 2.9 and 9.4 times more than PSI-BLAST, at the family, superfamily and fold level, respectively.Speed: HHsearch scans a query of 200 residues against 3691 domains in 33 s on an AMD64 2GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than COMPASS.

[1]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[2]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[3]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[4]  P. Bork,et al.  Predicting functions from protein sequences—where are the bottlenecks? , 1998, Nature Genetics.

[5]  Aurélien Grosdidier,et al.  APDB: a novel measure for benchmarking sequence alignment methods without reference alignments , 2003, ISMB.

[6]  K Karplus,et al.  What is the value added by human intervention in protein structure prediction? , 2001, Proteins.

[7]  Roland L Dunbrack,et al.  Scoring profile‐to‐profile sequence alignments , 2004, Protein science : a publication of the Protein Society.

[8]  Kimmen Sjölander,et al.  A comparison of scoring functions for protein sequence profile alignment , 2004, Bioinform..

[9]  Marc A. Martí-Renom,et al.  EVA: evaluation of protein structure prediction servers , 2003, Nucleic Acids Res..

[10]  Lisa N Kinch,et al.  Evolution of protein structures and functions. , 2002, Current opinion in structural biology.

[11]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[12]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[13]  B. Honig,et al.  On the role of structural information in remote homology detection and sequence alignment: new methods using hybrid sequence profiles. , 2003, Journal of molecular biology.

[14]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[15]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[16]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[17]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[18]  Yutaka Akiyama,et al.  FORTE: a profile-profile comparison tool for protein fold recognition , 2004, Bioinform..

[19]  R. Doolittle Similar amino acid sequences: chance or common ancestry? , 1981, Science.

[20]  K. Nishikawa,et al.  Protein structure comparison using the Markov transition model of evolution , 2000, Proteins.

[21]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[22]  Jimin Pei,et al.  PCMA: fast and accurate multiple sequence alignment based on profile consistency , 2003, Bioinform..

[23]  Jimin Pei,et al.  AL2CO: calculation of positional conservation in a protein sequence alignment , 2001, Bioinform..

[24]  Christian N. S. Pedersen,et al.  Metrics and Similarity Measures for Hidden Markov Models , 1999, ISMB.

[25]  Ceslovas Venclovas,et al.  Comparative modeling in CASP5: Progress is evident, but alignment errors remain a significant hindrance , 2003, Proteins.

[26]  A. Sali,et al.  Alignment of protein sequences by their profiles , 2004, Protein science : a publication of the Protein Society.

[27]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[28]  E. Sitbon,et al.  Consistency analysis of similarity between multiple alignments: prediction of protein function and fold structure from analysis of local sequence motifs. , 2001, Journal of molecular biology.

[29]  Marcin von Grotthuss,et al.  ORFeus: detection of distant homology using sequence profiles and predicted secondary structure , 2003, Nucleic Acids Res..

[30]  Nick V. Grishin,et al.  Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments , 2003, Bioinform..

[31]  Melissa S. Cline,et al.  Predicting reliable regions in protein sequence alignments , 2002, Bioinform..

[32]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[33]  Richard Hughey,et al.  Scoring hidden Markov models , 1997, Comput. Appl. Biosci..

[34]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[35]  Nick V Grishin,et al.  Access the most recent version at doi: 10.1110/ps.03197403 References , 2003 .

[36]  Anna R Panchenko,et al.  Finding weak similarities between proteins by sequence profile comparison. , 2003, Nucleic acids research.

[37]  Lvek,et al.  Evolution of protein structures and functions , 2022 .

[38]  M. Wilmanns,et al.  Divergent Evolution of (??)8-Barrel Enzymes , 2001, Biological chemistry.

[39]  Kimmen Sjölander,et al.  SATCHMO: Sequence Alignment and Tree Construction Using Hidden Markov Models , 2003, Bioinform..

[40]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[41]  S. Pietrokovski Searching databases of conserved sequence regions by aligning protein multiple-alignments. , 1996, Nucleic acids research.

[42]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[43]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[44]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[45]  Roland L. Dunbrack,et al.  CAFASP3: The third critical assessment of fully automated structure prediction methods , 2003, Proteins.

[46]  Ralf Zimmer,et al.  Profile-Profile Alignment: A Powerful Tool for Protein Structure Prediction , 2002, Pacific Symposium on Biocomputing.

[47]  Roland L. Dunbrack,et al.  CAFASP2: The second critical assessment of fully automated structure prediction methods , 2001, Proteins.

[48]  Lisa N Kinch,et al.  CASP5 assessment of fold recognition target predictions , 2003, Proteins.

[49]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[50]  A. Elofsson,et al.  Hidden Markov models that use predicted secondary structures for fold recognition , 1999, Proteins.

[51]  JOHN T. GULICK,et al.  Divergent Evolution , 1888, Nature.