Scientific benchmarks for guiding macromolecular energy function improvement.

Accurate energy functions are critical to macromolecular modeling and design. We describe new tools for identifying inaccuracies in energy functions and guiding their improvement, and illustrate the application of these tools to the improvement of the Rosetta energy function. The feature analysis tool identifies discrepancies between structures deposited in the PDB and low-energy structures generated by Rosetta; these likely arise from inaccuracies in the energy function. The optE tool optimizes the weights on the different components of the energy function by maximizing the recapitulation of a wide range of experimental observations. We use the tools to examine three proposed modifications to the Rosetta energy function: improving the unfolded state energy model (reference energies), using bicubic spline interpolation to generate knowledge-based torisonal potentials, and incorporating the recently developed Dunbrack 2010 rotamer library (Shapovalov & Dunbrack, 2011).

[1]  E. Coutsias,et al.  Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling , 2009, Nature Methods.

[2]  Vasantha Pattabhi,et al.  CH...O Hydrogen Bonds in -sheets , 1997 .

[3]  W. L. Jorgensen,et al.  Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids , 1996 .

[4]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Roland L. Dunbrack,et al.  Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. , 1997, Journal of molecular biology.

[6]  Roland L. Dunbrack Rotamer libraries in the 21st century. , 2002, Current opinion in structural biology.

[7]  George A. Kaminski,et al.  Force Field Validation Using Protein Side Chain Prediction , 2002 .

[8]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[9]  D Gilis,et al.  Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. , 1997, Journal of molecular biology.

[10]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[11]  Mark Bailey,et al.  The Grammar of Graphics , 2007, Technometrics.

[12]  N. Grishin,et al.  Side‐chain modeling with an optimized scoring function , 2002, Protein science : a publication of the Protein Society.

[13]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[14]  Sandro Bottaro,et al.  Potentials of Mean Force for Protein Structure Prediction Vindicated, Formalized and Generalized , 2010, PloS one.

[15]  U. Singh,et al.  A NEW FORCE FIELD FOR MOLECULAR MECHANICAL SIMULATION OF NUCLEIC ACIDS AND PROTEINS , 1984 .

[16]  Roland L. Dunbrack,et al.  A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. , 2011, Structure.

[17]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[18]  Andrew Leaver-Fay,et al.  A Generic Program for Multistate Protein Design , 2011, PloS one.

[19]  B. Kuhlman,et al.  Computational protein design with explicit consideration of surface hydrophobic patches , 2012, Proteins.

[20]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[21]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[22]  R. A. Pasternak Crystallographic evidence for the existence of B7O , 1959 .

[23]  Olga Kennard,et al.  Crystallographic evidence for the existence of CH.cntdot..cntdot..cntdot.O, CH.cntdot..cntdot..cntdot.N and CH.cntdot..cntdot..cntdot.Cl hydrogen bonds , 1982 .

[24]  D. Baker,et al.  Atomic accuracy in predicting and designing non-canonical RNA structure , 2010, Nature Methods.

[25]  Tanja Kortemme,et al.  Control of protein signaling using a computationally designed GTPase/GEF orthogonal pair , 2012, Proceedings of the National Academy of Sciences.

[26]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[27]  M Karplus,et al.  Protein sidechain conformer prediction: a test of the energy function. , 1998, Folding & design.

[28]  J. Richardson,et al.  Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. , 1999, Journal of molecular biology.

[29]  J. Richardson,et al.  “THE PLOT” THICKENS: MORE DATA, MORE DIMENSIONS, MORE USES , 2013 .

[30]  Shiow-Fen Hwang,et al.  SODOCK: Swarm optimization for highly flexible protein–ligand docking , 2007, J. Comput. Chem..

[31]  D. Baker,et al.  Alternate states of proteins revealed by detailed energy landscape mapping. , 2011, Journal of molecular biology.

[32]  Tanja Kortemme,et al.  Potential functions for hydrogen bonds in protein structure prediction and design. , 2005, Advances in protein chemistry.

[33]  D. Baker,et al.  High-resolution Structural and Thermodynamic Analysis of Extreme Stabilization of Human Procarboxypeptidase by Computational Protein Design , 2007, Journal of molecular biology.

[34]  D. Baker,et al.  Structure-guided forcefield optimization , 2011, Proteins.

[35]  David Baker,et al.  Algorithm discovery by protein folding game players , 2011, Proceedings of the National Academy of Sciences.

[36]  Jasmine L. Gallaher,et al.  Alteration of enzyme specificity by computational loop remodeling and design , 2009, Proceedings of the National Academy of Sciences.

[37]  D. Baker,et al.  RosettaHoles: Rapid assessment of protein core packing for structure prediction, refinement, design, and validation , 2008, Protein science : a publication of the Protein Society.

[38]  Yann LeCun,et al.  Loss Functions for Discriminative Training of Energy-Based Models , 2005, AISTATS.

[39]  M. Karplus,et al.  Effective energy function for proteins in solution , 1999, Proteins.

[40]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[41]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[42]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[43]  David Baker,et al.  High-resolution structural validation of the computational redesign of human U1A protein. , 2006, Structure.

[44]  D. Baker,et al.  Role of conformational sampling in computing mutation‐induced changes in protein structure and stability , 2011, Proteins.

[45]  Hadley Wickham,et al.  A Layered Grammar of Graphics , 2010 .

[46]  O. Schueler‐Furman,et al.  Improved side‐chain modeling for protein–protein docking , 2005, Protein science : a publication of the Protein Society.

[47]  Vincent B. Chen,et al.  Correspondence e-mail: , 2000 .

[48]  D. Baker,et al.  Rapid protein fold determination using unassigned NMR data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[49]  G. Schreiber,et al.  Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. , 2009, Protein engineering, design & selection : PEDS.

[50]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[51]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[52]  David Baker,et al.  Protein-protein docking with backbone flexibility. , 2007, Journal of molecular biology.

[53]  Chen Yanover,et al.  Optimizing energy functions for protein–protein interface design , 2011, J. Comput. Chem..

[54]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[55]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[56]  P. Bradley,et al.  Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers , 2011, Nucleic acids research.

[57]  Jens Meiler,et al.  RosettaScripts: A Scripting Language Interface to the Rosetta Macromolecular Modeling Suite , 2011, PloS one.

[58]  Feng Ding,et al.  Correction: Emergence of Protein Fold Families through Rational Design , 2006, PLoS Comput. Biol..

[59]  M. Karplus,et al.  An analysis of incorrectly folded protein models. Implications for structure predictions. , 1984, Journal of molecular biology.