A graphical model approach for predicting free energies of association for protein-protein interactions under backbone and side-chain flexibility

Biomolecular systems are governed by changes in free energy, and the ability to predict binding free energies provides both better understanding of biomolecular interactions and the ability to optimize them. We present the first graphical-model based approach, which we call GOBLIN (Graphical mOdel for BiomoLecular INteractions), for predicting binding free energies for all-atom models of protein complexes. Our method is physically sound in that internal energies are computed using standard molecular-mechanics force fields, and free energies are obtained by computing a rigorous approximation to the partition function of the system. Moreover, GOBLIN explicitly models both backbone and side-chain flexibility, and, when desired, employs non-linear regression to optimize force-field parameters. In tests on a benchmark set of more than 700 mutants, we show that our method is fast, running in a few minutes, and accurate, achieving root mean square errors (RMSEs) between predicted and experimental binding free energies of 2.05 kcal/mol. GOBLIN’s RMSEs are 0.55 kcal/mol better than the well-known program ROSETTA, despite the fact that we use the ROSETTA force field for computing internal energies. That is, our increase in accuracy is due to our ability to accurately estimate entropic contributions to the free energy. Finally, using our novel algorithm for optimizing force-field parameters on specific protein complexes reduced GOBLIN’s RMSE by 0.26 kcal/mol on average.

[1]  H. Bethe Statistical Theory of Superlattices , 1935 .

[2]  R. Kikuchi A Theory of Cooperative Phenomena , 1951 .

[3]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[4]  Edwin T. Jaynes,et al.  Prior Probabilities , 1968, Encyclopedia of Machine Learning.

[5]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[6]  M. Sternberg,et al.  Analysis of the relationship between side-chain conformation and secondary structure in globular proteins. , 1987, Journal of molecular biology.

[7]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[8]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[9]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[10]  T. Morita Cluster Variation Method for Non-Uniform Ising and Heisenberg Models and Spin-Pair Correlation Function , 1991 .

[11]  Hans-Joachim Böhm,et al.  The computer program LUDI: A new method for the de novo design of enzyme inhibitors , 1992, J. Comput. Aided Mol. Des..

[12]  R. Martin Chavez,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  P. Koehl,et al.  Application of a self-consistent mean field theory to predict protein side-chains conformation and estimate their conformational entropy. , 1994, Journal of molecular biology.

[14]  K. Dill,et al.  Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[15]  A. Atilgan,et al.  Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. , 1997, Folding & design.

[16]  C. Jarzynski Nonequilibrium Equality for Free Energy Differences , 1996, cond-mat/9610209.

[17]  P. Kollman,et al.  Continuum Solvent Studies of the Stability of DNA, RNA, and Phosphoramidate−DNA Helices , 1998 .

[18]  I. Weber,et al.  Molecular mechanics analysis of drug-resistant mutants of HIV protease. , 1999, Protein engineering.

[19]  M. Karplus,et al.  Effective energy function for proteins in solution , 1999, Proteins.

[20]  J. Richardson,et al.  Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. , 1999, Journal of molecular biology.

[21]  J. Richardson,et al.  The penultimate rotamer library , 2000, Proteins.

[22]  S. Anderson,et al.  Predicting the reactivity of proteins from their sequence alone: Kazal family of protein inhibitors of serine proteinases. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  A. Tropsha,et al.  Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations. , 2001, Journal of molecular biology.

[24]  P. Kollman,et al.  Computational study of protein specificity: The molecular basis of HIV-1 protease drug resistance , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[26]  D. Baker,et al.  A simple physical model for binding energy hot spots in protein–protein complexes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[27]  J. Åqvist,et al.  Ligand binding affinities from MD simulations. , 2002, Accounts of chemical research.

[28]  Yair Weiss,et al.  Approximate Inference and Protein-Folding , 2002, NIPS.

[29]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[30]  D. Case,et al.  Insights into protein-protein binding by binding free energy calculation and free energy decomposition for the Ras-Raf and Ras-RalGDS complexes. , 2003, Journal of molecular biology.

[31]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[32]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[33]  Jessica H. Fong,et al.  Predicting specificity in bZIP coiled-coil protein interactions , 2004, Genome Biology.

[34]  Richard D. Taylor,et al.  Improved protein–ligand docking using GOLD , 2003, Proteins.

[35]  Bruce Randall Donald,et al.  A novel ensemble-based scoring and search algorithm for protein redesign, and its application to modify the substrate specificity of the gramicidin synthetase A phenylalanine adenylation enzyme , 2004, RECOMB.

[36]  Tanja Kortemme,et al.  Computational design of protein-protein interactions. , 2004, Current opinion in chemical biology.

[37]  Jinbo Xu,et al.  Rapid Protein Side-Chain Packing via Tree Decomposition , 2005, RECOMB.

[38]  Mona Singh,et al.  Solving and analyzing side-chain positioning problems using linear and integer programming , 2005, Bioinform..

[39]  Bruce Randall Donald,et al.  A Novel Ensemble-Based Scoring and Search Algorithm for Protein Redesign and Its Application to Modify the Substrate Specificity of the Gramicidin Synthetase A Phenylalanine Adenylation Enzyme , 2005, J. Comput. Biol..

[40]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[41]  M. Thorpe,et al.  Constrained geometric simulation of diffusive motion in proteins , 2005, Physical biology.

[42]  Ian W. Davis,et al.  The backrub motion: how protein backbone shrugs when a sidechain dances. , 2006, Structure.

[43]  Sachdev S Sidhu,et al.  Comprehensive and Quantitative Mapping of Energy Landscapes for Protein-Protein Interactions by Rapid Combinatorial Scanning*♦ , 2006, Journal of Biological Chemistry.

[44]  A. Leach,et al.  Prediction of Protein—Ligand Interactions. Docking and Scoring: Successes and Gaps , 2006 .

[45]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[46]  I. Muegge PMF scoring revisited. , 2006, Journal of medicinal chemistry.

[47]  D. Baker,et al.  Computational design of a new hydrogen bond network and at least a 300-fold specificity switch at a protein-protein interface. , 2006, Journal of molecular biology.

[48]  Jude W. Shavlik,et al.  A probabilistic approach to protein backbone tracing in electron density maps , 2006, ISMB.

[49]  X. Daura,et al.  Configurational entropy elucidates the role of salt‐bridge networks in protein thermostability , 2007, Protein science : a publication of the Protein Society.

[50]  Manfred K. Warmuth,et al.  Engineering proteinase K using machine learning and synthetic genes , 2007, BMC biotechnology.

[51]  Eric P. Xing,et al.  Free Energy Estimates of All-Atom Protein Structures Using Generalized Belief Propagation , 2007, RECOMB.

[52]  Yair Weiss,et al.  Minimizing and Learning Energy Functions for Side-Chain Prediction , 2007, RECOMB.

[53]  Jude W. Shavlik,et al.  Creating protein models from electron-density maps using particle-filtering methods , 2007, Bioinform..

[54]  Colin A. Smith,et al.  Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. , 2008, Journal of molecular biology.

[55]  Bruce Randall Donald,et al.  Algorithm for backrub motions in protein design , 2008, ISMB.

[56]  Menachem Fromer,et al.  A computational framework to empower probabilistic protein design , 2008, ISMB.

[57]  Colin A. Smith,et al.  A simple model of backbone flexibility improves modeling of side-chain conformational variability. , 2008, Journal of molecular biology.

[58]  William A. McLaughlin,et al.  Entropic contributions and the influence of the hydrophobic environment in promiscuous protein–protein association , 2008, Proceedings of the National Academy of Sciences.

[59]  C. Bailey-Kellogg,et al.  Graphical Models of Residue Coupling in Protein Families , 2008, TCBB.

[60]  Chris Bailey-Kellogg,et al.  Graphical Models of Residue Coupling in Protein Families , 2008, IEEE ACM Trans. Comput. Biol. Bioinform..

[61]  K. Drahos,et al.  Hydrophobic core mutations associated with cataract formation destabilize human gammaD‐crystallin , 2009 .

[62]  Chris Bailey-Kellogg,et al.  Protein Design by Sampling an Undirected Graphical Model of Residue Constraints , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.