Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions.

We explore the ability of a simple simulated annealing procedure to assemble native-like structures from fragments of unrelated protein structures with similar local sequences using Bayesian scoring functions. Environment and residue pair specific contributions to the scoring functions appear as the first two terms in a series expansion for the residue probability distributions in the protein database; the decoupling of the distance and environment dependencies of the distributions resolves the major problems with current database-derived scoring functions noted by Thomas and Dill. The simulated annealing procedure rapidly and frequently generates native-like structures for small helical proteins and better than random structures for small beta sheet containing proteins. Most of the simulated structures have native-like solvent accessibility and secondary structure patterns, and thus ensembles of these structures provide a particularly challenging set of decoys for evaluating scoring functions. We investigate the effects of multiple sequence information and different types of conformational constraints on the overall performance of the method, and the ability of a variety of recently developed scoring functions to recognize the native-like conformations in the ensembles of simulated structures.

[1]  W. Wooster,et al.  Crystal structure of , 2005 .

[2]  R. Friesner,et al.  Computer modeling of protein folding: conformational and energetic analysis of reduced and detailed protein models. , 1995, Journal of molecular biology.

[3]  W. Braun,et al.  Predicting the helix packing of globular proteins by self‐correcting distance geometry , 1995, Protein science : a publication of the Protein Society.

[4]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[5]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  A. Beyer,et al.  An improved pair potential to recognize native protein folds , 1994, Proteins.

[7]  M Levitt,et al.  Recognizing native folds by the arrangement of hydrophobic and polar residues. , 1995, Journal of molecular biology.

[8]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[9]  J. Deisenhofer Crystallographic refinement and atomic models of a human Fc fragment and its complex with fragment B of protein A from Staphylococcus aureus at 2.9- and 2.8-A resolution. , 1981, Biochemistry.

[10]  Alfonso Mondragón,et al.  STRUCTURE OF PHAGE 434 CRO PROTEIN AT 2.35 ANGSTROMS RESOLUTION , 1989 .

[11]  Carl O. Pabo,et al.  Crystal structure of an engrailed homeodomain-DNA complex at 2.8 Å resolution: A framework for understanding homeodomain-DNA interactions , 1990, Cell.

[12]  A. Gronenborn,et al.  A novel, highly stable fold of the immunoglobulin binding domain of streptococcal protein G. , 1993, Science.

[13]  Manfred J. Sippl,et al.  Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures , 1993, J. Comput. Aided Mol. Des..

[14]  D. Baker,et al.  Recurring local sequence motifs in proteins. , 1995, Journal of molecular biology.

[15]  A. Elofsson,et al.  Local moves: An efficient algorithm for simulation of protein folding , 1995, Proteins.

[16]  D Baker,et al.  Global properties of the mapping between local amino acid sequence and local structure in proteins. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[17]  E S Huang,et al.  Factors affecting the ability of energy functions to discriminate correct from incorrect folds. , 1997, Journal of molecular biology.

[18]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[19]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[20]  S Subbiah,et al.  Structure of the amino-terminal domain of phage 434 repressor at 2.0 A resolution. , 1989, Journal of molecular biology.

[21]  K. Dill,et al.  Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[22]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[23]  J Moult,et al.  Genetic algorithms for protein structure prediction. , 1996, Current opinion in structural biology.

[24]  J. Moult,et al.  Determination of the conformation of folding initiation sites in proteins by computer simulation , 1995, Proteins.

[25]  A. Liljas,et al.  Structure of the C-terminal domain of the ribosomal protein L7/L12 from Escherichia coli at 1.7 A. , 1987, Journal of molecular biology.

[26]  K Yue,et al.  Folding proteins with a simple energy function and extensive conformational searching , 1996, Protein science : a publication of the Protein Society.

[27]  S. Forsén,et al.  Proline cis-trans isomers in calbindin D9k observed by X-ray crystallography. , 1992, Journal of molecular biology.

[28]  D. Eisenberg,et al.  An evolutionary approach to folding small alpha-helical proteins that uses sequence information and an empirical guiding fitness function. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[29]  P. Kraulis A program to produce both detailed and schematic plots of protein structures , 1991 .

[30]  M. Sippl Calculation of conformational ensembles from potentials of mena force , 1990 .

[31]  Burkhard Rost,et al.  PHD - an automatic mail server for protein secondary structure prediction , 1994, Comput. Appl. Biosci..

[32]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[33]  K. Dill,et al.  A simple protein folding algorithm using a binary code and secondary structure constraints. , 1995, Protein engineering.

[34]  F. Cohen,et al.  Multiple sequence information for threading algorithms. , 1996, Journal of molecular biology.

[35]  V Muñoz,et al.  Local versus nonlocal interactions in protein folding and stability--an experimentalist's point of view. , 1996, Folding & design.

[36]  Jorja G. Henikoff,et al.  Using substitution probabilities to improve position-specific scoring matrices , 1996, Comput. Appl. Biosci..

[37]  S. Harrison,et al.  Structure of phage 434 Cro protein at 2.35 A resolution. , 1989, Journal of molecular biology.

[38]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[39]  R. Huber,et al.  Accurate Bond and Angle Parameters for X-ray Protein Structure Refinement , 1991 .

[40]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[41]  S. Wodak,et al.  Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. , 1994, Journal of molecular biology.

[42]  L Serrano,et al.  Folding kinetics of Che Y mutants with enhanced native alpha-helix propensities. , 1997, Journal of molecular biology.

[43]  P Argos,et al.  Identifying the tertiary fold of small proteins with different topologies from sequence and secondary structure using the genetic algorithm and extended criteria specific for strand regions. , 1996, Journal of molecular biology.

[44]  S. Doniach,et al.  A computer model to dynamically simulate protein folding: Studies with crambin , 1989, Proteins.

[45]  R. Jernigan,et al.  Structure-derived potentials and protein simulations. , 1996, Current opinion in structural biology.

[46]  D Baker,et al.  A desolvation barrier to hydrophobic cluster formation may contribute to the rate‐limiting step in protein folding , 1997, Protein science : a publication of the Protein Society.

[47]  M. Sternberg,et al.  On the prediction of protein structure: The significance of the root-mean-square deviation. , 1980, Journal of molecular biology.

[48]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[49]  N. D. Clarke,et al.  Identification of protein folds: Matching hydrophobicity patterns of sequence sets with solvent accessibility patterns of known structures , 1990, Proteins.

[50]  B Honig,et al.  An algorithm to generate low-resolution protein tertiary structures from knowledge of secondary structure. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[52]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[53]  M J Sippl,et al.  Progress in fold recognition , 1995, Proteins.

[54]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[55]  D Baker,et al.  Local sequence-structure correlations in proteins. , 1996, Current opinion in biotechnology.

[56]  K. Fidelis,et al.  Comparison of systematic search and database methods for constructing segments of protein structure. , 1994, Protein engineering.

[57]  J. Skolnick,et al.  Monte carlo simulations of protein folding. II. Application to protein A, ROP, and crambin , 1994, Proteins.

[58]  T. Salakoski,et al.  Selection of a representative set of structures from brookhaven protein data bank , 1992, Proteins.

[59]  R. Srinivasan,et al.  LINUS: A hierarchic procedure to predict the fold of a protein , 1995, Proteins.