Evaluation and optimization of discrete state models of protein folding.

The space accessed by a folding macromolecule is vast, and how to best project computer simulations of protein folding trajectories into an interpretable sequence of discrete states is an open research problem. There are numerous alternative ways of associating individual configurations into collective states, and in deciding on the number of such clustered states there is a trade-off between human interpretability (smaller number of states) and accuracy of representation (larger number of states). Here we introduce a trajectory likelihood measure for assessing alternative discrete state models of protein folding. We find that widely used rmsd-based clustering methods require large numbers of initial states and a second agglomeration step based on kinetic connectivity to produce models with high predictive power; this is the approach taken in elegant recent work with Markov State Models of protein folding. In contrast, we find that grouping of states based on secondary structure pairings or contact maps, when refined with K-means clustering, yields higher likelihood models with many fewer states. Using the most predictive contact map representation to study the folding transitions of the WW domain in very long molecular dynamics simulations, we identify new states and transitions. The methods should be generally useful for investigating the structural transitions in protein folding simulations for larger proteins.

[1]  Michael I. Jordan,et al.  Feature space resampling for protein conformational search , 2010, Proteins.

[2]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[3]  Frank Noé,et al.  Markov models of molecular kinetics: generation and validation. , 2011, The Journal of chemical physics.

[4]  V. Pande,et al.  Network models for molecular kinetics and their initial applications to human health , 2010, Cell Research.

[5]  G. de Fabritiis,et al.  Complete reconstruction of an enzyme-inhibitor binding process by molecular dynamics simulations , 2011, Proceedings of the National Academy of Sciences.

[6]  Kyle A. Beauchamp,et al.  Markov state model reveals folding and functional dynamics in ultra-long MD trajectories. , 2011, Journal of the American Chemical Society.

[7]  Lorna J. Smith,et al.  Understanding protein folding via free-energy surfaces from theory and experiment. , 2000, Trends in biochemical sciences.

[8]  Peter G Wolynes,et al.  Protein structure prediction using basin-hopping. , 2008, The Journal of chemical physics.

[9]  García,et al.  Large-amplitude nonlinear motions in proteins. , 1992, Physical review letters.

[10]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[11]  K. Dill,et al.  Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[12]  C. Schütte,et al.  Supplementary Information for “ Constructing the Equilibrium Ensemble of Folding Pathways from Short Off-Equilibrium Simulations ” , 2009 .

[13]  Vijay S Pande,et al.  The social network (of protein conformations) , 2011, Proceedings of the National Academy of Sciences.

[14]  David A C Beck,et al.  A one-dimensional reaction coordinate for identification of transition states from explicit solvent P(fold)-like calculations. , 2007, Biophysical journal.

[15]  Vijay S. Pande,et al.  Everything you wanted to know about Markov State Models but were afraid to ask. , 2010, Methods.

[16]  Xuhui Huang,et al.  Using generalized ensemble simulations and Markov state models to identify conformational states. , 2009, Methods.

[17]  Philip E. Dawson,et al.  Context-dependent contributions of backbone hydrogen bonding to β-sheet folding energetics , 2004, Nature.

[18]  Peter G Wolynes,et al.  P versus Q: structural reaction coordinates capture protein folding on smooth landscapes. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Hernan F. Stamati,et al.  Application of nonlinear dimensionality reduction to characterize the conformational landscape of small peptides , 2010, Proteins.

[20]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[21]  Vijay S Pande,et al.  Protein folded states are kinetic hubs , 2010, Proceedings of the National Academy of Sciences.

[22]  Joseph A. Bank,et al.  Supporting Online Material Materials and Methods Figs. S1 to S10 Table S1 References Movies S1 to S3 Atomic-level Characterization of the Structural Dynamics of Proteins , 2022 .

[23]  Francesco Rao,et al.  Protein dynamics investigated by inherent structure analysis , 2010, Proceedings of the National Academy of Sciences.

[24]  G. Hummer,et al.  Reaction coordinates and rates from transition paths. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[25]  David Chandler,et al.  Transition path sampling: throwing ropes over rough mountain passes, in the dark. , 2002, Annual review of physical chemistry.

[26]  John D Chodera,et al.  Bayesian comparison of Markov models of molecular dynamics with detailed balance constraint. , 2009, The Journal of chemical physics.

[27]  Frank Noé,et al.  Probing molecular kinetics with Markov models: metastable states, transition pathways and spectroscopic observables. , 2011, Physical chemistry chemical physics : PCCP.

[28]  Kyle A. Beauchamp,et al.  Quantitative comparison of villin headpiece subdomain simulations and triplet–triplet energy transfer experiments , 2011, Proceedings of the National Academy of Sciences.

[29]  R. Dror,et al.  How Fast-Folding Proteins Fold , 2011, Science.

[30]  Kyle A. Beauchamp,et al.  Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39). , 2010, Journal of the American Chemical Society.

[31]  Daniel‐Adriano Silva,et al.  Simulating the T-jump-triggered unfolding dynamics of trpzip2 peptide and its time-resolved IR and two-dimensional IR signals using the Markov state model approach. , 2011, The journal of physical chemistry. B.

[32]  H. Berendsen,et al.  Essential dynamics of proteins , 1993, Proteins.

[33]  Thomas J Lane,et al.  MSMBuilder2: Modeling Conformational Dynamics at the Picosecond to Millisecond Scale. , 2011, Journal of chemical theory and computation.

[34]  David Baker,et al.  Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation , 2011, Proteins.

[35]  Vijay S Pande,et al.  Using path sampling to build better Markovian state models: predicting the folding rate and mechanism of a tryptophan zipper beta hairpin. , 2004, The Journal of chemical physics.

[36]  Peter G Wolynes,et al.  Chemical frustration in the protein folding landscape: grand canonical ensemble simulations of cytochrome c. , 2009, Biochemistry.

[37]  Lydia E Kavraki,et al.  Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction , 2006, Proc. Natl. Acad. Sci. USA.

[38]  Vincent A. Voelz,et al.  Atomistic folding simulations of the five-helix bundle protein λ(6−85). , 2011, Journal of the American Chemical Society.

[39]  F. Rao,et al.  The protein folding network. , 2004, Journal of molecular biology.

[40]  Gerhard Hummer,et al.  Coordinate-dependent diffusion in protein folding , 2009, Proceedings of the National Academy of Sciences.

[41]  P. Bolhuis,et al.  Rate constant and reaction coordinate of Trp-cage folding in explicit water. , 2008, Biophysical journal.

[42]  Jason C. Crane,et al.  The folding mechanism of a -sheet: the WW domain1 , 2001 .

[43]  Ioannis G. Kevrekidis,et al.  Nonlinear dimensionality reduction in molecular simulation: The diffusion map approach , 2011 .

[44]  M. Karplus,et al.  How does a protein fold? , 1994, Nature.

[45]  Ioannis G Kevrekidis,et al.  Integrating diffusion maps with umbrella sampling: application to alanine dipeptide. , 2011, The Journal of chemical physics.

[46]  F. Noé,et al.  Transition networks for modeling the kinetics of conformational change in macromolecules. , 2008, Current opinion in structural biology.

[47]  A. Berezhkovskii,et al.  Reactive flux and folding pathways in network models of coarse-grained protein dynamics. , 2009, The Journal of chemical physics.

[48]  Vijay S Pande,et al.  Progress and challenges in the automated construction of Markov state models for full protein systems. , 2009, The Journal of chemical physics.