Finding weak similarities between proteins by sequence profile comparison.

To improve the recognition of weak similarities between proteins a method of aligning two sequence profiles is proposed. It is shown that exploring the sequence space in the vicinity of the sequence with unknown properties significantly improves the performance of sequence alignment methods. Consistent with the previous observations the recognition sensitivity and alignment accuracy obtained by a profile-profile alignment method can be as much as 30% higher compared to the sequence-profile alignment method. It is demonstrated that the choice of score function and the diversity of the test profile are very important factors for achieving the maximum performance of the method, whereas the optimum range of these parameters depends on the level of similarity to be recognized.

[1]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[2]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[3]  S. Pietrokovski Searching databases of conserved sequence regions by aligning protein multiple-alignments. , 1996, Nucleic acids research.

[4]  C. Chothia,et al.  Intermediate sequences increase the detection of homology between sequences. , 1997, Journal of molecular biology.

[5]  Dayhoff Mo,et al.  The origin and evolution of protein superfamilies. , 1976 .

[6]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Mark Gerstein,et al.  Measurement of the effectiveness of transitive sequence comparison, through a third 'intermediate' sequence , 1998, Bioinform..

[8]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[9]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[10]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[11]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[12]  Shmuel Pietrokovski,et al.  Increased coverage of protein families with the Blocks Database servers , 2000, Nucleic Acids Res..

[13]  S. Bryant,et al.  Identification of homologous core structures , 1999, Proteins.

[14]  Christian N. S. Pedersen,et al.  Metrics and Similarity Measures for Hidden Markov Models , 1999, ISMB.

[15]  S H Bryant,et al.  Measures of threading specificity and accuracy , 1997, Proteins.

[16]  C. Chothia,et al.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[17]  E. Koonin,et al.  Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. , 1999, Journal of molecular biology.

[18]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[19]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[20]  S. Bryant Evaluation of threading specificity and accuracy , 1996, Proteins.

[21]  A. Panchenko,et al.  A comparison of position‐specific score matrices based on sequence and structure alignments , 2002, Protein science : a publication of the Protein Society.

[22]  T. Hubbard,et al.  Critical assessment of methods of protein structure prediction (CASP): Round III , 1999, Proteins.

[23]  M. O. Dayhoff,et al.  The origin and evolution of protein superfamilies. , 1976, Federation proceedings.

[24]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[25]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[26]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[27]  Osamu Gotoh,et al.  Optimal alignment between groups of sequences and its application to multiple sequence alignment , 1993, Comput. Appl. Biosci..

[28]  C A Orengo,et al.  Combining sensitive database searches with multiple intermediates to detect distant homologues. , 1999, Protein engineering.

[29]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[30]  D. Lipman,et al.  Extracting protein alignment models from the sequence database. , 1997, Nucleic acids research.

[31]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[32]  Michael Gribskov,et al.  Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..

[33]  L Rychlewski,et al.  Fold predictions for bacterial genomes. , 2001, Journal of structural biology.

[34]  S. Bryant,et al.  Statistics of sequence-structure threading. , 1995, Current opinion in structural biology.

[35]  Benjamin A. Shoemaker,et al.  CDD: a database of conserved domain alignments with links to domain three-dimensional structure , 2002, Nucleic Acids Res..

[36]  C. Sander,et al.  Parser for protein folding units , 1994, Proteins.

[37]  A. Panchenko,et al.  Combination of threading potentials and sequence profiles improves fold recognition. , 2000, Journal of molecular biology.

[38]  A. Panchenko,et al.  Threading with explicit models for evolutionary conservation of structure and sequence , 1999, Proteins.

[39]  John B. Anderson,et al.  MMDB: Entrez's 3D-structure database , 2002, Nucleic Acids Res..

[40]  S H Bryant,et al.  A retrospective analysis of CASP2 threading predictions , 1997, Proteins.

[41]  L Rychlewski,et al.  The Helicobacter pylori genome: From sequence analysis to structural and functional predictions , 1999, Proteins.

[42]  C Sander,et al.  Predicting protein structure using hidden Markov models , 1997, Proteins.

[43]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[44]  M Levitt,et al.  Competitive assessment of protein fold recognition and alignment accuracy , 1997, Proteins.