Beyond rotamers: a generative, probabilistic model of side chains in proteins

BackgroundAccurately covering the conformational space of amino acid side chains is essential for important applications such as protein design, docking and high resolution structure prediction. Today, the most common way to capture this conformational space is through rotamer libraries - discrete collections of side chain conformations derived from experimentally determined protein structures. The discretization can be exploited to efficiently search the conformational space. However, discretizing this naturally continuous space comes at the cost of losing detailed information that is crucial for certain applications. For example, rigorously combining rotamers with physical force fields is associated with numerous problems.ResultsIn this work we present BASILISK: a generative, probabilistic model of the conformational space of side chains that makes it possible to sample in continuous space. In addition, sampling can be conditional upon the protein's detailed backbone conformation, again in continuous space - without involving discretization.ConclusionsA careful analysis of the model and a comparison with various rotamer libraries indicates that the model forms an excellent, fully continuous model of side chain conformational space. We also illustrate how the model can be used for rigorous, unbiased sampling with a physical force field, and how it improves side chain prediction when used as a pseudo-energy term. In conclusion, BASILISK is an important step forward on the way to a rigorous probabilistic description of protein structure in continuous space and in atomic detail.

[1]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[2]  N. Grishin,et al.  Side‐chain modeling with an optimized scoring function , 2002, Protein science : a publication of the Protein Society.

[3]  I. Lasters,et al.  Fast and accurate side‐chain topology and energy refinement (FASTER) as a new method for protein structure optimization , 2002, Proteins.

[4]  R. Lavery,et al.  A new approach to the rapid determination of protein side chain conformations. , 1991, Journal of biomolecular structure & dynamics.

[5]  Roland L. Dunbrack,et al.  Bayesian statistical analysis of protein side‐chain rotamer preferences , 1997, Protein science : a publication of the Protein Society.

[6]  Barry Honig,et al.  Extending the accuracy limits of prediction for side-chain conformations. , 2001 .

[7]  Z. Xiang,et al.  Extending the accuracy limits of prediction for side-chain conformations. , 2001, Journal of molecular biology.

[8]  H. Eyring STERIC HINDRANCE AND COLLISION DIAMETERS1 , 1932 .

[9]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[10]  D. Dowe,et al.  An MML classification of protein structure that knows about angles and sequence. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[11]  Thomas Hamelryck,et al.  Probabilistic models and machine learning in structural bioinformatics , 2009, Statistical methods in medical research.

[12]  Andrew J. Bulpitt,et al.  A Primer on Learning in Bayesian Networks for Computational Biology , 2007, PLoS Comput. Biol..

[13]  R. Chandrasekaran,et al.  STUDIES ON THE CONFORMATION OF AMINO ACIDS , 2009 .

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  Sean R. Eddy,et al.  Biological sequence analysis: Contents , 1998 .

[16]  Kanti V. Mardia,et al.  A Probabilistic Model of RNA Conformational Space , 2009, PLoS Comput. Biol..

[17]  Roland L. Dunbrack Rotamer libraries in the 21st century. , 2002, Current opinion in structural biology.

[18]  Yair Weiss,et al.  Minimizing and Learning Energy Functions for Side-Chain Prediction , 2007, RECOMB.

[19]  Bernard Manderick,et al.  PDB file parser and structure class implemented in Python , 2003, Bioinform..

[20]  Jesper Ferkinghoff-Borg,et al.  A generative, probabilistic model of local protein structure , 2008, Proceedings of the National Academy of Sciences.

[21]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[22]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[23]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[24]  Jianpeng Ma,et al.  OPUS‐Rota: A fast and accurate method for side‐chain modeling , 2008, Protein science : a publication of the Protein Society.

[25]  O. Schueler‐Furman,et al.  Improved side‐chain modeling for protein–protein docking , 2005, Protein science : a publication of the Protein Society.

[26]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[27]  Chris Bailey-Kellogg,et al.  A graphical model approach for predicting free energies of association for protein-protein interactions under backbone and side-chain flexibility , 2008 .

[28]  J Andrew McCammon,et al.  Configurational‐bias sampling technique for predicting side‐chain conformations in proteins , 2006, Protein science : a publication of the Protein Society.

[29]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[30]  J. Lennard-jones,et al.  On the Forces between Atoms and Ions , 1925 .

[31]  J. Richardson,et al.  The penultimate rotamer library , 2000, Proteins.

[32]  Douglas L. Theobald,et al.  Accurate Structural Correlations from Maximum Likelihood Superpositions , 2008, PLoS Comput. Biol..

[33]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[34]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[35]  Thomas Hamelryck,et al.  Mocapy++ - A toolkit for inference and learning in dynamic Bayesian networks , 2009, BMC Bioinformatics.

[36]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[37]  S. Nielsen The stochastic EM algorithm: estimation and asymptotic results , 2000 .

[38]  Thomas Lengauer,et al.  IRECS: A new algorithm for the selection of most probable ensembles of side‐chain conformations in protein models , 2007, Protein science : a publication of the Protein Society.

[39]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[40]  Roland L. Dunbrack,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS Improved prediction of protein side-chain conformations with SCWRL4 , 2022 .

[41]  W. L. Jorgensen,et al.  Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids , 1996 .

[42]  M Karplus,et al.  The energetics of off-rotamer protein side-chain conformations. , 2001, Journal of molecular biology.

[43]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[44]  Jacquelyn S. Fetrow,et al.  Using Information Theory to Discover Side Chain Rotamer Classes: Analysis of the Effects of Local Backbone Structure , 1999, Pacific Symposium on Biocomputing.

[45]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[46]  A. Keating,et al.  Computing van der Waals energies in the context of the rotamer approximation , 2007, Proteins.

[47]  Anders Krogh,et al.  Sampling Realistic Protein Conformations Using Local Structural Bias , 2006, PLoS Comput. Biol..

[48]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[49]  J. Richardson,et al.  Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. , 1999, Journal of molecular biology.

[50]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[51]  A Joshua Wand,et al.  Improved side‐chain prediction accuracy using an ab initio potential energy function and a very large rotamer library , 2004, Protein science : a publication of the Protein Society.

[52]  BMC Bioinformatics , 2005 .

[53]  G. N. Ramachandran,et al.  Studies on the conformation of amino acids. XI. Analysis of the observed side group conformation in proteins. , 2009, International journal of protein research.

[54]  P. Argos,et al.  Rotamers: to be or not to be? An analysis of amino acid side-chain conformations in globular proteins. , 1993, Journal of molecular biology.

[55]  R. Huber,et al.  Accurate Bond and Angle Parameters for X-ray Protein Structure Refinement , 1991 .

[56]  Daniele Sciretti,et al.  Computational protein design with side‐chain conformational entropy , 2009, Proteins.

[57]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[58]  Simon Cawley,et al.  HMM sampling and applications to gene finding and alternative splicing , 2003, ECCB.

[59]  T. Blundell,et al.  Incorporating knowledge-based biases into an energy-based side-chain modeling method: application to comparative modeling of protein structure. , 2001, Biopolymers.

[60]  R. Friesner,et al.  Evaluation and Reparametrization of the OPLS-AA Force Field for Proteins via Comparison with Accurate Quantum Chemical Calculations on Peptides† , 2001 .

[61]  Eric P. Xing,et al.  Free Energy Estimates of All-Atom Protein Structures Using Generalized Belief Propagation , 2007, RECOMB.