Using evolutionary information for the query and target improves fold recognition

In this study, we show that it is possible to increase the performance over PSI‐BLAST by using evolutionary information for both query and target sequences. This information can be used in three different ways: by sequence linking, profile–profile alignments, and by combining sequence–profile and profile–sequence searches. If only PSI‐BLAST is used, 16% of superfamily‐related protein domains can be detected at 90% specificity, but if a sequence–profile and a profile–sequence search are combined, this is increased to 20%, profile–profile searches detects 19%, whereas a linking procedure identifies 22% of these proteins. All three methods show equal performance, but the best combination of speed and accuracy seems to be obtained by the combined searches, because this method shows a good performance even at high specificity and the lowest computational cost. In addition, we show that the E‐values reported by all these methods, including PSI‐BLAST, underestimate the true rate of false positives. This behavior is seen even if a very strict E‐value cutoff and a limited number of iterations are used. However, the difference is more pronounced with a looser E‐value cutoff and more iterations. Proteins 2003;53:000–000. © 2003 Wiley‐Liss, Inc.

[1]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[2]  C A Orengo,et al.  Genome analysis: Assigning protein coding regions to three‐dimensional structures , 1999 .

[3]  C. Chothia,et al.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  C Venclovas,et al.  Comparative modeling of CASP4 target proteins: Combining results of sequence search with three‐dimensional structure assessment , 2001, Proteins.

[5]  A. Godzik,et al.  Sequence clustering strategies improve remote homology recognitions while reducing search times. , 2002, Protein engineering.

[6]  Mounir Errami,et al.  Detection of unrelated proteins in sequences multiple alignments by using predicted secondary structures , 2003, Bioinform..

[7]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[8]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[9]  C. Chothia,et al.  Intermediate sequences increase the detection of homology between sequences. , 1997, Journal of molecular biology.

[10]  D Fischer,et al.  Hybrid fold recognition: combining sequence derived properties with evolutionary information. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[11]  J Lundström,et al.  Pcons: A neural‐network–based consensus predictor that improves fold recognition , 2001, Protein science : a publication of the Protein Society.

[12]  Alejandro A. Schäffer,et al.  IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices , 1999, Bioinform..

[13]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[14]  D Eisenberg,et al.  A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. , 1997, Journal of molecular biology.

[15]  Ralf Zimmer,et al.  Improving Profile-Profile Alignments via Log Average Scoring , 2001, WABI.

[16]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[17]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[18]  D Fischer,et al.  LiveBench‐2: Large‐scale automated evaluation of protein structure prediction servers , 2001, Proteins.

[19]  Chris Sander,et al.  Removing near-neighbour redundancy from large protein sequence collections , 1998, Bioinform..

[20]  R. Abagyan,et al.  Do aligned sequences share the same fold? , 1997, Journal of molecular biology.

[21]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[22]  A. Elofsson,et al.  Hidden Markov models that use predicted secondary structures for fold recognition , 1999, Proteins.

[23]  A. Elofsson,et al.  Can correct protein models be identified? , 2003, Protein science : a publication of the Protein Society.

[24]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[25]  C Sander,et al.  An evolutionary treasure: unification of a broad set of amidohydrolases related to urease , 1997, Proteins.

[26]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[27]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[28]  E. Lindahl,et al.  Identification of related proteins on family, superfamily and fold level. , 2000, Journal of molecular biology.

[29]  D Fischer,et al.  LiveBench‐1: Continuous benchmarking of protein structure prediction servers , 2001, Protein science : a publication of the Protein Society.

[30]  J. M. Bradshaw,et al.  Mutational investigation of the specificity determining region of the Src SH2 domain. , 2000, Journal of molecular biology.

[31]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[32]  Arne Elofsson,et al.  A study on protein sequence alignment quality , 2002, Proteins.

[33]  D. Fischer,et al.  Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[34]  Robert B Russell,et al.  Fold recognition without folds , 2002, Protein science : a publication of the Protein Society.