Generalized ensemble methods for de novo structure prediction

Current methods for predicting protein structure depend on two interrelated components: (i) an energy function that should have a low value near the correct structure and (ii) a method for searching through different conformations of the polypeptide chain. Identification of the most efficient search methods is essential if we are to be able to apply such methods broadly and with confidence. In addition, efficient search methods provide a rigorous test of existing energy functions, which are generally knowledge-based and contain different terms added together with arbitrary weights. Here, we test different search methods with one of the most accurate and predictive energy functions, namely Rosetta the knowledge-based force-field from Baker's group [Simons K, Kooperberg C, Huang E, Baker D (1997) J Mol Biol 268:209–225]. We use an implementation of a generalized ensemble search method to scale relevant parts of the energy function. This method, known as Hamiltonian Replica Exchange Monte Carlo, outperforms the original Monte Carlo Simulated Annealing used in the Rosetta package in terms of sampling low-energy states. It also outperforms another widely used generalized ensemble search method known as Temperature Replica Exchange Monte Carlo. Our results reveal clear deficiencies in the low-resolution Rosetta energy function in that the lowest energy structures are not necessarily the most native-like. By using a set of nonnative low-energy structures found by our extensive sampling, we discovered that the long-range and short-range backbone hydrogen-bonding energy terms of the Rosetta energy discriminate between the nonnative and native-like structures significantly better than the low-resolution score used in Rosetta.

[1]  B. Berne,et al.  Replica exchange with solute tempering: a method for sampling biological systems in explicit water. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  M. Levitt,et al.  Using a hydrophobic contact potential to evaluate native and near-native folds generated by molecular dynamics simulations. , 1996, Journal of molecular biology.

[3]  U. Hansmann Protein folding simulations in a deformed energy landscape , 1999, physics/0001028.

[4]  J. Skolnick,et al.  Ab initio folding of proteins using restraints derived from evolutionary information , 1999, Proteins.

[5]  Liliana Wroblewska,et al.  Protein model refinement using an optimized physics-based all-atom force field , 2008, Proceedings of the National Academy of Sciences.

[6]  O. Schueler‐Furman,et al.  Progress in Modeling of Protein Structures and Interactions , 2005, Science.

[7]  B. Honig,et al.  Refining homology models by combining replica‐exchange molecular dynamics and statistical potentials , 2008, Proteins.

[8]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[9]  J. Hammersley,et al.  Monte Carlo Methods , 1965 .

[10]  D. Baker,et al.  Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  P. Bradley,et al.  Toward High-Resolution de Novo Structure Prediction for Small Proteins , 2005, Science.

[12]  Wang,et al.  Replica Monte Carlo simulation of spin glasses. , 1986, Physical review letters.

[13]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[14]  D. Baker,et al.  Clustering of low-energy conformations near the native structures of small proteins. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Lars Malmström,et al.  Structure prediction for CASP7 targets using extensive all‐atom refinement with Rosetta@home , 2007, Proteins.

[16]  J. Skolnick,et al.  Local energy landscape flattening: Parallel hyperbolic Monte Carlo sampling of protein folding , 2002, Proteins.

[17]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[18]  K. Misura,et al.  PROTEINS: Structure, Function, and Bioinformatics 59:15–29 (2005) Progress and Challenges in High-Resolution Refinement of Protein Structure Models , 2022 .

[19]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[20]  Gerard T. Barkema,et al.  Monte Carlo Methods in Statistical Physics , 1999 .

[21]  Yang Zhang,et al.  TASSER: An automated method for the prediction of protein tertiary structures in CASP6 , 2005, Proteins.

[22]  David Baker,et al.  Macromolecular modeling with rosetta. , 2008, Annual review of biochemistry.

[23]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[24]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[25]  S. Takada,et al.  On the Hamiltonian replica exchange method for efficient sampling of biomolecular systems: Application to protein structure prediction , 2002 .

[26]  Yuko Okamoto,et al.  Generalized-ensemble algorithms: enhanced sampling techniques for Monte Carlo and molecular dynamics simulations. , 2003, Journal of molecular graphics & modelling.