Use of structure comparison methods for the refinement of protein structure predictions. I. Identifying the structural family of a protein from low‐resolution models

Predicting the three‐dimensional structure of proteins is still one of the most challenging problems in molecular biology. Despite its difficulty, several investigators have started to produce consistently low‐resolution predictions for small proteins. However, in most of these cases, the prediction accuracy is still too low to make them useful. In the present article, we address the problem of obtaining better‐quality predictions, starting from low‐resolution models. To this end, we have devised a new procedure that uses these models, together with structure comparison methods, to identify the structural family of the target protein. This would allow, in a second step not described in the present work, to refine the predictions using conserved features of the identified family. In our approach, the structure database is investigated using predictions, at different accuracy levels, for a given protein. As query structures, we used both low‐resolution versions of the native structures, as well as different sets of low accuracy predictions. In general, we found that for predictions with a resolution of ≥5–7 Å, structure comparison methods were able to identify the fold of a protein in the top positions. Proteins 2002;46:72–84. © 2001 Wiley‐Liss, Inc.

[1]  E. Huang,et al.  Are predicted structures good enough to preserve functional sites? , 1999, Structure.

[2]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[3]  V A Eyrich,et al.  Prediction of protein tertiary structure to low resolution: performance for a large and structurally diverse test set. , 1999, Journal of molecular biology.

[4]  J. Skolnick,et al.  Fold assembly of small proteins using monte carlo simulations driven by restraints derived from multiple sequence alignments. , 1998, Journal of molecular biology.

[5]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[6]  C. Orengo CORA—Topological fingerprints for protein structural families , 2008, Protein science : a publication of the Protein Society.

[7]  David C. Jones,et al.  Progress in protein structure prediction. , 1997, Current opinion in structural biology.

[8]  E S Huang,et al.  Factors affecting the ability of energy functions to discriminate correct from incorrect folds. , 1997, Journal of molecular biology.

[9]  Richard Bonneau,et al.  Ab initio protein structure prediction: progress and prospects. , 2001, Annual review of biophysics and biomolecular structure.

[10]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[11]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[12]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[13]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[14]  D. T. Jones,et al.  Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure , 1999, Proteins.

[15]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[16]  Michael W. Mahoney,et al.  Discrete representations of the protein Cα chain , 1997 .

[17]  B Lee,et al.  Discrete representations of the protein C alpha chain. , 1997, Folding & design.

[18]  P Argos,et al.  Identifying the tertiary fold of small proteins with different topologies from sequence and secondary structure using the genetic algorithm and extended criteria specific for strand regions. , 1996, Journal of molecular biology.

[19]  K. Dill Folding proteins: finding a needle in a haystack , 1993 .

[20]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[21]  Y. Cui,et al.  Protein folding simulation with genetic algorithm and supersecondary structure constraints , 1998, Proteins.

[22]  Y Wang,et al.  A new procedure for constructing peptides into a given Calpha chain. , 1998, Folding & design.

[23]  J. Thornton,et al.  Factors limiting the performance of prediction‐based fold recognition methods , 2008, Protein science : a publication of the Protein Society.

[24]  P Willett,et al.  Use of techniques derived from graph theory to compare secondary structure motifs in proteins. , 1990, Journal of molecular biology.

[25]  M Gerstein,et al.  Protein evolution. How far can sequences diverge? , 1997, Nature.

[26]  Mark Gerstein,et al.  How far can sequences diverge? , 1997, Nature.

[27]  R. Friesner,et al.  Computer modeling of protein folding: conformational and energetic analysis of reduced and detailed protein models. , 1995, Journal of molecular biology.

[28]  J M Thornton,et al.  Sequences annotated by structure: a tool to facilitate the use of structural information in sequence analysis. , 1998, Protein engineering.

[29]  Chris Sander,et al.  Protein folds and families: sequence and structure alignments , 1999, Nucleic Acids Res..

[30]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[31]  M. Levitt,et al.  The complexity and accuracy of discrete state models of protein structure. , 1995, Journal of molecular biology.

[32]  S. Sun,et al.  Reduced representation model of protein structure prediction: Statistical potential and genetic algorithms , 1993, Protein science : a publication of the Protein Society.

[33]  W. Baumeister,et al.  Macromolecular electron microscopy in the era of structural genomics. , 2000, Trends in biochemical sciences.

[34]  A. Panchenko,et al.  Threading with explicit models for evolutionary conservation of structure and sequence , 1999, Proteins.

[35]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[36]  M. Baker,et al.  Bridging the information gap: computational tools for intermediate resolution structure interpretation. , 2001, Journal of molecular biology.

[37]  A V Finkelstein,et al.  Protein structure: what is it possible to predict now? , 1997, Current opinion in structural biology.

[38]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[39]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[40]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[41]  R. Srinivasan,et al.  LINUS: A hierarchic procedure to predict the fold of a protein , 1995, Proteins.

[42]  M. Levitt,et al.  Protein folding: the endgame. , 1997, Annual review of biochemistry.

[43]  W R Taylor,et al.  Fast structure alignment for protein databank searching , 1992, Proteins.

[44]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[45]  Robert B Russell,et al.  Classification of protein folds , 2002, Molecular biotechnology.

[46]  James E. Bray,et al.  Assigning genomic sequences to CATH , 2000, Nucleic Acids Res..