Progress and challenges in the automated construction of Markov state models for full protein systems.

Markov state models (MSMs) are a powerful tool for modeling both the thermodynamics and kinetics of molecular systems. In addition, they provide a rigorous means to combine information from multiple sources into a single model and to direct future simulations/experiments to minimize uncertainties in the model. However, constructing MSMs is challenging because doing so requires decomposing the extremely high dimensional and rugged free energy landscape of a molecular system into long-lived states, also called metastable states. Thus, their application has generally required significant chemical intuition and hand-tuning. To address this limitation we have developed a toolkit for automating the construction of MSMs called MSMBUILDER (available at https://simtk.org/home/msmbuilder). In this work we demonstrate the application of MSMBUILDER to the villin headpiece (HP-35 NleNle), one of the smallest and fastest folding proteins. We show that the resulting MSM captures both the thermodynamics and kinetics of the original molecular dynamics of the system. As a first step toward experimental validation of our methodology we show that our model provides accurate structure prediction and that the longest timescale events correspond to folding.

[1]  John D. Chodera,et al.  Long-Time Protein Folding Dynamics from Short-Time Molecular Dynamics Simulations , 2006, Multiscale Model. Simul..

[2]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[3]  Berk Hess,et al.  GROMACS 3.0: a package for molecular simulation and trajectory analysis , 2001 .

[4]  V. Pande,et al.  Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics. , 2007, The Journal of chemical physics.

[5]  J. Hofrichter,et al.  Sub-microsecond protein folding. , 2006, Journal of molecular biology.

[6]  C. Brooks,et al.  Statistical clustering techniques for the analysis of long molecular dynamics trajectories: analysis of 2.2-ns trajectories of YPGDV. , 1993, Biochemistry.

[7]  Xuhui Huang,et al.  Using generalized ensemble simulations and Markov state models to identify conformational states. , 2009, Methods.

[8]  Regine Herbst-Irmer,et al.  High-resolution x-ray crystal structures of the villin headpiece subdomain, an ultrafast folding protein. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  M. Karplus,et al.  Hidden complexity of free energy surfaces for peptide (protein) folding. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  V. Pande,et al.  Heterogeneity even at the speed limit of folding: large-scale molecular dynamics study of a fast-folding variant of the villin headpiece. , 2007, Journal of molecular biology.

[11]  John D Chodera,et al.  Bayesian comparison of Markov models of molecular dynamics with detailed balance constraint. , 2009, The Journal of chemical physics.

[12]  C. Dellago,et al.  Reaction coordinates of biomolecular isomerization. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  V. Pande,et al.  Using massively parallel simulation and Markovian models to study protein folding: examining the dynamics of the villin headpiece. , 2006, The Journal of chemical physics.

[14]  K. Dill,et al.  The protein folding problem. , 1993, Annual review of biophysics.

[15]  S. Nosé,et al.  Constant pressure molecular dynamics for molecular systems , 1983 .

[16]  P. Kollman,et al.  Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models , 1992 .

[17]  S. Nosé A molecular dynamics method for simulations in the canonical ensemble , 1984 .

[18]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[19]  William Swope,et al.  Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 1. Theory , 2004 .

[20]  Hoover,et al.  Canonical dynamics: Equilibrium phase-space distributions. , 1985, Physical review. A, General physics.

[21]  Tomasz Zastawniak,et al.  Basic stochastic processes : a course through exercises , 1999 .

[22]  Paul Tavan,et al.  Extracting Markov Models of Peptide Conformational Dynamics from Simulation Data. , 2005, Journal of chemical theory and computation.

[23]  M. Parrinello,et al.  Polymorphic transitions in single crystals: A new molecular dynamics method , 1981 .

[24]  David Chandler,et al.  Transition path sampling: throwing ropes over rough mountain passes, in the dark. , 2002, Annual review of physical chemistry.

[25]  I. Kevrekidis,et al.  Coarse master equation from Bayesian analysis of replica molecular dynamics simulations. , 2005, The journal of physical chemistry. B.

[26]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[27]  Benoît Roux,et al.  Mapping the conformational transition in Src activation by cumulating the information from multiple molecular dynamics trajectories , 2009, Proceedings of the National Academy of Sciences.

[28]  Philip M. Long,et al.  Performance guarantees for hierarchical clustering , 2002, J. Comput. Syst. Sci..

[29]  D. Teplow,et al.  Small assemblies of unmodified amyloid β-protein are the proximate neurotoxin in Alzheimer’s disease , 2004, Neurobiology of Aging.

[30]  V. Pande,et al.  Simulated tempering yields insight into the low‐resolution Rosetta scoring functions , 2009, Proteins.

[31]  G. Hummer,et al.  Coarse master equations for peptide folding dynamics. , 2008, The journal of physical chemistry. B.

[32]  F. Noé,et al.  Transition networks for modeling the kinetics of conformational change in macromolecules. , 2008, Current opinion in structural biology.

[33]  Jianyin Shao,et al.  Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms. , 2007, Journal of chemical theory and computation.

[34]  V. Pande,et al.  On the transition coordinate for protein folding , 1998 .

[35]  D. van der Spoel,et al.  GROMACS: A message-passing parallel molecular dynamics implementation , 1995 .

[36]  K. Dill,et al.  Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[37]  P. Kollman,et al.  How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? , 2000 .

[38]  C. Anfinsen,et al.  The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. , 1961, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[40]  Leonidas J. Guibas,et al.  Structural Insight into RNA Hairpin Folding Intermediates , 2008, Journal of the American Chemical Society.

[41]  G. Ciccotti,et al.  Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes , 1977 .

[42]  V. Pande,et al.  Rapid equilibrium sampling initiated from nonequilibrium data , 2009, Proceedings of the National Academy of Sciences.