Unbiased, scalable sampling of protein loop conformations from probabilistic priors

BackgroundProtein loops are flexible structures that are intimately tied to function, but understanding loop motion and generating loop conformation ensembles remain significant computational challenges. Discrete search techniques scale poorly to large loops, optimization and molecular dynamics techniques are prone to local minima, and inverse kinematics techniques can only incorporate structural preferences in adhoc fashion. This paper presents Sub-Loop Inverse Kinematics Monte Carlo (SLIKMC), a new Markov chain Monte Carlo algorithm for generating conformations of closed loops according to experimentally available, heterogeneous structural preferences.ResultsOur simulation experiments demonstrate that the method computes high-scoring conformations of large loops (> 10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques. Two new developments contribute to the scalability of the new method. First, structural preferences are specified via a probabilistic graphical model (PGM) that links conformation variables, spatial variables (e.g., atom positions), constraints and prior information in a unified framework. The method uses a sparse PGM that exploits locality of interactions between atoms and residues. Second, a novel method for sampling sub-loops is developed to generate statistically unbiased samples of probability densities restricted by loop-closure constraints.ConclusionNumerical experiments confirm that SLIKMC generates conformation ensembles that are statistically consistent with specified structural preferences. Protein conformations with 100+ residues are sampled on standard PC hardware in seconds. Application to proteins involved in ion-binding demonstrate its potential as a tool for loop ensemble generation and missing structure completion.

[1]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[2]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[3]  H. Scheraga,et al.  Monte Carlo-minimization approach to the multiple-minima problem in protein folding. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Kumar,et al.  Efficient Monte Carlo methods for the computer simulation of biological molecules. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[5]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[6]  R. Samudrala,et al.  An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. , 1998, Journal of molecular biology.

[7]  U H Hansmann,et al.  New Monte Carlo algorithms for protein folding. , 1999, Current opinion in structural biology.

[8]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[9]  Juan J. de Pablo,et al.  Monte Carlo simulation of proteins through a random walk in energy space , 2002 .

[10]  M. Levitt,et al.  A comprehensive analysis of 40 blind protein structure predictions , 2002, BMC Structural Biology.

[11]  Eckart Bindewald,et al.  A divide and conquer approach to fast loop modeling. , 2002, Protein engineering.

[12]  Adrian A Canutescu,et al.  Cyclic coordinate descent: A robotics algorithm for protein loop closure , 2003, Protein science : a publication of the Protein Society.

[13]  M. DePristo,et al.  Ab initio construction of polypeptide fragments: Efficient generation of accurate, representative ensembles , 2003, Proteins.

[14]  Ian W. Davis,et al.  Structure validation by Cα geometry: ϕ,ψ and Cβ deviation , 2003, Proteins.

[15]  Chaok Seok,et al.  A kinematic view of loop closure , 2004, J. Comput. Chem..

[16]  Thierry Siméon,et al.  Geometric algorithms for the conformational analysis of long protein loops , 2004, J. Comput. Chem..

[17]  Itay Lotan,et al.  Real-space protein-model completion: an inverse-kinematics approach. , 2005, Acta crystallographica. Section D, Biological crystallography.

[18]  Shing-Chung Ngan,et al.  PROTINFO: new algorithms for enhanced protein structure predictions , 2005, Nucleic Acids Res..

[19]  L. Kavraki,et al.  Modeling protein conformational ensembles: From missing loops to equilibrium fluctuations , 2006, Proteins.

[20]  Yair Weiss,et al.  Minimizing and Learning Energy Functions for Side-Chain Prediction , 2007, RECOMB.

[21]  Jean-Claude Latombe,et al.  Efficient Algorithms to Explore Conformation Spaces of Flexible Protein Loops , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Andrzej Kolinski,et al.  Modeling of loops in proteins: a multi-method approach , 2010, BMC Structural Biology.

[23]  E. Coutsias,et al.  Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling , 2009, Nature Methods.

[24]  Andrej Sali,et al.  Inferential optimization for simultaneous fitting of multiple components into a CryoEM map of their assembly. , 2009, Journal of molecular biology.

[25]  Roland L. Dunbrack,et al.  A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. , 2011, Structure.

[26]  W. Marsden I and J , 2012 .