Assembly of protein structure from sparse experimental data: An efficient Monte Carlo model

A new, efficient method for the assembly of protein tertiary structure from known, loosely encoded secondary structure restraints and sparse information about exact side chain contacts is proposed and evaluated. The method is based on a new, very simple method for the reduced modeling of protein structure and dynamics, where the protein is described as a lattice chain connecting side chain centers of mass rather than Cαs. The model has implicit built‐in multibody correlations that simulate short‐ and long‐range packing preferences, hydrogen bonding cooperativity and a mean force potential describing hydrophobic interactions. Due to the simplicity of the protein representation and definition of the model force field, the Monte Carlo algorithm is at least an order of magnitude faster than previously published Monte Carlo algorithms for structure assembly. In contrast to existing algorithms, the new method requires a smaller number of tertiary restraints for successful fold assembly; on average, one for every seven residues as compared to one for every four residues. For example, for smaller proteins such as the B domain of protein G, the resulting structures have a coordinate root mean square deviation (cRMSD), which is about 3 Å from the experimental structure; for myoglobin, structures whose backbone cRMSD is 4.3 Å are produced, and for a 247‐residue TIM barrel, the cRMSD of the resulting folds is about 6 Å. As would be expected, increasing the number of tertiary restraints improves the accuracy of the assembled structures. The reliability and robustness of the new method should enable its routine application in model building protocols based on various (very sparse) experimentally derived structural restraints. Proteins 32:475–494, 1998. © 1998 Wiley‐Liss, Inc.

[1]  H. Scheraga,et al.  Experimental and theoretical aspects of protein folding. , 1975, Advances in protein chemistry.

[2]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[3]  P. Gennes Scaling Concepts in Polymer Physics , 1979 .

[4]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[5]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[6]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[7]  N Go,et al.  Calculation of protein conformations by proton-proton distance constraints. A new efficient algorithm. , 1985, Journal of molecular biology.

[8]  Timothy F. Havel,et al.  An evaluation of the combined use of nuclear magnetic resonance and distance geometry for the determination of protein conformations in solution. , 1985, Journal of molecular biology.

[9]  A. D. McLachlan,et al.  Solvation energy in protein folding and binding , 1986, Nature.

[10]  W. V. van Gunsteren,et al.  Protein structures from NMR. , 1988, Biochemistry.

[11]  J. Szulmajster Protein folding , 1988, Bioscience reports.

[12]  T. Creighton,et al.  Protein Folding , 1992 .

[13]  A. Gronenborn,et al.  A novel, highly stable fold of the immunoglobulin binding domain of streptococcal protein G. , 1993, Science.

[14]  K Wüthrich,et al.  Efficient computation of three-dimensional protein structures in solution from nuclear magnetic resonance data using the program DIANA and the supporting programs CALIBA, HABAS and GLOMSA. , 1991, Journal of molecular biology.

[15]  Timothy F. Havel An evaluation of computational strategies for use in the determination of protein structure from distance constraints obtained by nuclear magnetic resonance. , 1991, Progress in biophysics and molecular biology.

[16]  Michael Levitt,et al.  Protein folding: Current Opinion in Structural Biology 1991, 1:224–229 , 1991 .

[17]  A. Godzik,et al.  Regularities in interaction patterns of globular proteins. , 1993, Protein engineering.

[18]  R. Levy,et al.  Global folding of proteins using a limited number of distance constraints. , 1993, Protein engineering.

[19]  A. Godzik,et al.  A general method for the prediction of the three dimensional structure and folding pathway of globular proteins: Application to designed helical proteins , 1993 .

[20]  A. Gronenborn,et al.  Where is NMR taking us? , 1994, Proteins.

[21]  J. Skolnick,et al.  Monte carlo simulations of protein folding. I. Lattice model and interaction scheme , 1994, Proteins.

[22]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[23]  J. Skolnick,et al.  Monte carlo simulations of protein folding. II. Application to protein A, ROP, and crambin , 1994, Proteins.

[24]  W. Taylor,et al.  Global fold determination from a small number of distance restraints. , 1995, Journal of molecular biology.

[25]  Andrzej Kolinski,et al.  Computer design of idealized β‐motifs , 1995 .

[26]  A. Godzik,et al.  Are proteins ideal mixtures of amino acids? Analysis of energy parameter sets , 1995, Protein science : a publication of the Protein Society.

[27]  J. Skolnick,et al.  A reduced model of short range interactions in polypeptide chains , 1995 .

[28]  W. Braun,et al.  Automated assignment of simulated and experimental NOESY spectra of proteins by feedback filtering and self-correcting distance geometry. , 1995, Journal of molecular biology.

[29]  A Kolinski,et al.  Does a backwardly read protein sequence have a unique native state? , 1996, Protein engineering.

[30]  A Godzik,et al.  Knowledge-based potentials for protein folding: what can we learn from known protein structures? , 1996, Structure.

[31]  J. Skolnick,et al.  Lattice Models of Protein Folding, Dynamics and Thermodynamics , 1996 .

[32]  Andrzej Kolinski,et al.  On the origin of the cooperativity of protein folding: Implications from model simulations , 1996, Proteins.

[33]  A Kolinski,et al.  Folding simulations and computer redesign of protein A three‐helix bundle motifs , 1996, Proteins.

[34]  M. Billeter,et al.  MOLMOL: a program for display and analysis of macromolecular structures. , 1996, Journal of molecular graphics.

[35]  J R Gunn,et al.  Computational studies of protein folding. , 1996, Annual review of biophysics and biomolecular structure.

[36]  A Kolinski,et al.  Method for low resolution prediction of small protein tertiary structure. , 1997, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[37]  J. Skolnick,et al.  MONSSTER: a method for folding globular proteins with a small number of distance restraints. , 1997, Journal of molecular biology.

[38]  J. Skolnick,et al.  Determinants of secondary structure of polypeptide chains: Interplay between short range and burial interactions , 1997 .

[39]  A. Godzik,et al.  Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? , 1997, Protein science : a publication of the Protein Society.

[40]  Andrzej Kolinski,et al.  Monte Carlo lattice dynamics and the prediction of protein folds , 1997 .

[41]  J. Skolnick,et al.  An Efficient Monte Carlo Model of Protein Chains. Modeling the Short-Range Correlations between Side Group Centers of Mass , 1998 .

[42]  Gerard T. Barkema,et al.  Monte Carlo Methods in Statistical Physics , 1999 .