Identifying native‐like protein structures using physics‐based potentials

As the field of structural genomics matures, new methods will be required that can accurately and rapidly distinguish reliable structure predictions from those that are more dubious. We present a method based on the CHARMM gas phase implicit hydrogen force field in conjunction with a generalized Born implicit solvation term that allows one to make such discrimination. We begin by analyzing pairs of threaded structures from the EMBL database, and find that it is possible to identify the misfolded structures with over 90% accuracy. Further, we find that misfolded states are generally favored by the solvation term due to the mispairing of favorable intramolecular ionic contacts. We also examine 29 sets of 29 misfolded globin sequences from Levitt's “Decoys ‘R’ Us” database generated using a sequence homology‐based method. Again, we find that discrimination is possible with approximately 90% accuracy. Also, even in these less distorted structures, mispairing of ionic contacts results in a more favorable solvation energy for misfolded states. This is also found to be the case for collapsed, partially folded conformations of CspA and protein G taken from folding free energy calculations. We also find that the inclusion of the generalized Born solvation term, in postprocess energy evaluation, improves the correlation between structural similarity and energy in the globin database. This significantly improves the reliability of the hypothesis that more energetically favorable structures are also more similar to the native conformation. Additionally, we examine seven extensive collections of misfolded structures created by Park and Levitt using a four‐state reduced model also contained in the “Decoys ‘R’ Us” database. Results from these large databases confirm those obtained in the EMBL and misfolded globin databases concerning predictive accuracy, the energetic advantage of misfolded proteins regarding the solvation component, and the improved correlation between energy and structural similarity due to implicit solvation. Z‐scores computed for these databases are improved by including the generalized Born implicit solvation term, and are found to be comparable to trained and knowledge‐based scoring functions. Finally, we briefly explore the dynamic behavior of a misfolded protein relative to properly folded conformations. We demonstrate that the misfolded conformation diverges quickly from its initial structure while the properly folded states remain stable. Proteins in this study are shown to be more stable than their misfolded counterparts and readily identified based on energetic as well as dynamic criteria. In summary, we demonstrate the utility of physics‐based force fields in identifying native‐like conformations in a variety of preconstructed structural databases. The details of this discrimination are shown to be dependent on the construction of the structural database. © 2002 Wiley Periodicals, Inc. J Comput Chem 23: 147–160, 2002

[1]  C. Orengo,et al.  Analysis and assessment of ab initio three‐dimensional prediction, secondary structure, and contacts prediction , 1999, Proteins.

[2]  A. Fink Protein aggregation: folding aggregates, inclusion bodies and amyloid. , 1998, Folding & design.

[3]  W. C. Still,et al.  Semianalytical treatment of solvation for molecular mechanics and dynamics , 1990 .

[4]  Adam Godzik,et al.  A method for predicting protein structure from sequence , 1993, Current Biology.

[5]  C. Sander,et al.  Evaluation of protein models by atomic solvation preference. , 1992, Journal of molecular biology.

[6]  S. Bryant,et al.  Critical assessment of methods of protein structure prediction (CASP): Round II , 1997, Proteins.

[7]  J Moult,et al.  From fold to function. , 2000, Current opinion in structural biology.

[8]  M. Karplus,et al.  Solution conformations and thermodynamics of structured peptides: molecular dynamics simulation with an implicit solvation model. , 1998, Journal of molecular biology.

[9]  J Skolnick,et al.  Evaluation of atomic level mean force potentials via inverse folding and inverse refinement of protein structures: atomic burial position and pairwise non-bonded interactions. , 1996, Protein engineering.

[10]  S. Brenner,et al.  Expectations from structural genomics , 2008, Protein science : a publication of the Protein Society.

[11]  Jan Hermans,et al.  Discrimination between native and intentionally misfolded conformations of proteins: ES/IS, a new method for calculating conformational free energy that uses both dynamics simulations with an explicit solvent and an implicit solvent continuum model , 1998, Proteins.

[12]  Karplus,et al.  Protein folding bottlenecks: A lattice Monte Carlo simulation. , 1991, Physical review letters.

[13]  Andrew C. R. Martin,et al.  Assessment of comparative modeling in CASP2 , 1997, Proteins.

[14]  R. Elber,et al.  Distance‐dependent, pair potential for protein folding: Results from linear optimization , 2000, Proteins.

[15]  A Kolinski,et al.  Prediction of the folding pathways and structure of the GCN4 leucine zipper. , 1994, Journal of molecular biology.

[16]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[17]  A Kolinski,et al.  Dynamic Monte Carlo simulations of a new lattice model of globular protein folding, structure and dynamics. , 1991, Journal of molecular biology.

[18]  R Samudrala,et al.  Decoys ‘R’ Us: A database of incorrect conformations to improve protein structure prediction , 2000, Protein science : a publication of the Protein Society.

[19]  S E Ealick,et al.  Structure of scorpion toxin variant-3 at 1.2 A resolution. , 1992, Journal of molecular biology.

[20]  A M Lesk,et al.  CASP2: Report on ab initio predictions , 1997, Proteins.

[21]  T. Alwyn Jones,et al.  CASP3 comparative modeling evaluation , 1999, Proteins.

[22]  R. Friesner,et al.  Computer modeling of protein folding: conformational and energetic analysis of reduced and detailed protein models. , 1995, Journal of molecular biology.

[23]  A Kolinski,et al.  Correlation between knowledge‐based and detailed atomic potentials: Application to the unfolding of the GCN4 leucine zipper , 1999, Proteins.

[24]  B. Dominy Parameterization and Application of an Implicit Solvent Model for Macromolecules , 2000 .

[25]  P. Wolynes,et al.  Optimal protein-folding codes from spin-glass theory. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[26]  S. Harrison,et al.  Structure of phage 434 Cro protein at 2.35 A resolution. , 1989, Journal of molecular biology.

[27]  B. Honig,et al.  Calculation of the total electrostatic energy of a macromolecular system: Solvation energies, binding energies, and conformational analysis , 1988, Proteins.

[28]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[29]  P. Koehl,et al.  Atomic environment energies in proteins defined from statistics of accessible and contact surface areas. , 1995, Journal of molecular biology.

[30]  M. Karplus,et al.  A Comprehensive Analytical Treatment of Continuum Electrostatics , 1996 .

[31]  J. Skolnick,et al.  Prediction of quaternary structure of coiled coils. Application to mutants of the GCN4 leucine zipper. , 1995, Journal of molecular biology.

[32]  R. A. Scott,et al.  Discriminating compact nonnative structures from the native structure of globular proteins. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[33]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[34]  A. Sali,et al.  Protein structure modeling for structural genomics , 2000, Nature Structural Biology.

[35]  A. Murzin Structure classification‐based assessment of CASP3 predictions for the fold recognition targets , 1999, Proteins.

[36]  J. Skolnick,et al.  Combining MONSSTER and LES/PME to Predict Protein Structure from Amino Acid Sequence: Application to the Small Protein CMTI-1 , 2000 .

[37]  C L Brooks,et al.  Exploring the origins of topological frustration: design of a minimally frustrated model of fragment B of protein A. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[38]  M Feig,et al.  Accurate reconstruction of all‐atom protein representations from side‐chain‐based low‐resolution models , 2000, Proteins.

[39]  T. Darden,et al.  A smooth particle mesh Ewald method , 1995 .

[40]  M. Karplus,et al.  An analysis of incorrectly folded protein models. Implications for structure predictions. , 1984, Journal of molecular biology.

[41]  B. Honig,et al.  Evaluation of the conformational free energies of loops in proteins , 1994, Proteins.

[42]  D. Case,et al.  Thermodynamics of a reverse turn motif. Solvent effects and side-chain packing. , 1997, Journal of molecular biology.

[43]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[44]  M. Karplus,et al.  Discrimination of the native from misfolded protein models with an energy function including implicit solvation. , 1999, Journal of molecular biology.

[45]  J Skolnick,et al.  What should the Z‐score of native protein structures be? , 1998, Protein science : a publication of the Protein Society.

[46]  M. Karplus,et al.  How does a protein fold? , 1994, Nature.

[47]  D Eisenberg,et al.  Selecting protein targets for structural genomics of Pyrobaculum aerophilum: validating automated fold assignment methods by using binary hypothesis testing. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[48]  D. T. Jones,et al.  Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure , 1999, Proteins.

[49]  K. Dill,et al.  A fast conformational search strategy for finding low energy structures of model proteins , 1996, Protein science : a publication of the Protein Society.

[50]  Jacquelyn S. Fetrow,et al.  Structural genomics and its importance for gene function analysis , 2000, Nature Biotechnology.

[51]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[52]  C. Brooks,et al.  From folding theories to folding proteins: a review and assessment of simulation studies of protein folding and unfolding. , 2001, Annual review of physical chemistry.

[53]  William H. Press,et al.  Numerical recipes in C , 2002 .

[54]  B. Dominy,et al.  Development of a generalized Born model parameterization for proteins and nucleic acids , 1999 .

[55]  C. Brooks,et al.  Exploring the space of protein folding Hamiltonians: The balance of forces in a minimalist β-barrel model , 1998 .

[56]  M. Karplus,et al.  Kinetics of protein folding. A lattice model study of the requirements for folding to the native state. , 1994, Journal of molecular biology.

[57]  J. Skolnick,et al.  Monte carlo simulations of protein folding. I. Lattice model and interaction scheme , 1994, Proteins.

[58]  M. Levitt Accurate modeling of protein conformation by automatic segment matching. , 1992, Journal of molecular biology.

[59]  R. Bruccoleri,et al.  Twisted hyperboloid (Strophoid) as a model of beta-barrels in proteins. , 1984, Journal of molecular biology.

[60]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[61]  B. Honig,et al.  Free energy determinants of tertiary structure and the evaluation of protein models , 2000, Protein science : a publication of the Protein Society.

[62]  W. C. Still,et al.  The GB/SA Continuum Model for Solvation. A Fast Analytical Method for the Calculation of Approximate Born Radii , 1997 .

[63]  W A Koppensteiner,et al.  Sustained performance of knowledge‐based potentials in fold recognition , 1999, Proteins.

[64]  G. Ciccotti,et al.  Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes , 1977 .