Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions.

Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small--but nontrivial--differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.

[1]  G. Torrie,et al.  Monte Carlo free energy estimates using non-Boltzmann sampling: Application to the sub-critical Lennard-Jones fluid , 1974 .

[2]  Frank Noé,et al.  Markov state models based on milestoning. , 2011, The Journal of chemical physics.

[3]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[4]  K. Dill,et al.  Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[5]  Eric J. Sorin,et al.  Exploring the helix-coil transition via all-atom equilibrium ensemble simulations. , 2005, Biophysical journal.

[6]  García,et al.  Large-amplitude nonlinear motions in proteins. , 1992, Physical review letters.

[7]  Vijay S Pande,et al.  Using path sampling to build better Markovian state models: predicting the folding rate and mechanism of a tryptophan zipper beta hairpin. , 2004, The Journal of chemical physics.

[8]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[9]  Christof Schütte,et al.  Observation uncertainty in reversible Markov chains. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  G. Hummer,et al.  Reaction coordinates and rates from transition paths. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  V. Pande,et al.  Error analysis and efficient sampling in Markovian state models for molecular dynamics. , 2005, The Journal of chemical physics.

[12]  M. Parrinello,et al.  Polymorphic transitions in single crystals: A new molecular dynamics method , 1981 .

[13]  David Chandler,et al.  Transition path sampling: throwing ropes over rough mountain passes, in the dark. , 2002, Annual review of physical chemistry.

[14]  C. Schütte,et al.  Supplementary Information for “ Constructing the Equilibrium Ensemble of Folding Pathways from Short Off-Equilibrium Simulations ” , 2009 .

[15]  M. Maggioni,et al.  Determination of reaction coordinates via locally scaled diffusion map. , 2011, The Journal of chemical physics.

[16]  M. Karplus,et al.  Locally accessible conformations of proteins: Multiple molecular dynamics simulations of crambin , 1998, Protein science : a publication of the Protein Society.

[17]  I. G. Kevrekidis,et al.  Coarse-grained dynamics of an activity bump in a neural field model , 2007 .

[18]  A. Berezhkovskii,et al.  Ensemble of transition states for two-state protein folding from the eigenvectors of rate matrices. , 2004, Journal of Chemical Physics.

[19]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: multiscale methods. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Eric J. Sorin,et al.  Equilibrium conformational dynamics in an RNA tetraloop from massively parallel molecular dynamics , 2010, Nucleic acids research.

[21]  H. Nymeyer,et al.  Simulation of the folding equilibrium of α-helical peptides: A comparison of the generalized Born approximation with explicit solvent , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Jean-Paul Watson,et al.  Algorithmic dimensionality reduction for molecular structure analysis. , 2008, The Journal of chemical physics.

[24]  Ronald R. Coifman,et al.  Diffusion Maps, Reduction Coordinates, and Low Dimensional Representation of Stochastic Systems , 2008, Multiscale Model. Simul..

[25]  G. Hummer From transition paths to transition states and rate coefficients. , 2004, The Journal of chemical physics.

[26]  P. Deuflhard,et al.  A Direct Approach to Conformational Dynamics Based on Hybrid Monte Carlo , 1999 .

[27]  I. Kevrekidis,et al.  Think Globally, Move Locally: Coarse Graining of Effective Free Energy Surfaces , 2011 .

[28]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[29]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[30]  Gerhard Hummer,et al.  Diffusion models of protein folding. , 2011, Physical chemistry chemical physics : PCCP.

[31]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[32]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[33]  Xuhui Huang,et al.  Using generalized ensemble simulations and Markov state models to identify conformational states. , 2009, Methods.

[34]  Berk Hess,et al.  LINCS: A linear constraint solver for molecular simulations , 1997 .

[35]  T. Darden,et al.  A smooth particle mesh Ewald method , 1995 .

[36]  Ioannis G. Kevrekidis,et al.  Nonlinear dimensionality reduction in molecular simulation: The diffusion map approach , 2011 .

[37]  Ioannis G Kevrekidis,et al.  Integrating diffusion maps with umbrella sampling: application to alanine dipeptide. , 2011, The Journal of chemical physics.

[38]  Steven W. Zucker,et al.  Diffusion Maps and Geometric Harmonics for Automatic Target Recognition (ATR). Volume 2. Appendices , 2007 .

[39]  G. Hummer,et al.  Coarse master equations for peptide folding dynamics. , 2008, The journal of physical chemistry. B.

[40]  Bernard R. Brooks,et al.  Artificial reaction coordinate “tunneling” in free‐energy calculations: The catalytic reaction of RNase H , 2009, J. Comput. Chem..

[41]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[42]  Andrew L. Ferguson,et al.  Systematic determination of order parameters for chain dynamics using diffusion maps , 2010, Proceedings of the National Academy of Sciences.

[43]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[44]  H. Berendsen,et al.  Essential dynamics of proteins , 1993, Proteins.

[45]  Ioannis G Kevrekidis,et al.  Variable-free exploration of stochastic models: a gene regulatory network example. , 2006, The Journal of chemical physics.

[46]  Cecilia Clementi,et al.  Polymer reversal rate calculated via locally scaled diffusion map. , 2011, The Journal of chemical physics.

[47]  A. Caflisch,et al.  Delineation of folding pathways of a β-sheet miniprotein. , 2011, The journal of physical chemistry. B.

[48]  Ann B. Lee,et al.  Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  I. Kevrekidis,et al.  Coarse master equation from Bayesian analysis of replica molecular dynamics simulations. , 2005, The journal of physical chemistry. B.

[50]  Frank Noé,et al.  On the Approximation Quality of Markov State Models , 2010, Multiscale Model. Simul..

[51]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[52]  A. Laio,et al.  Escaping free-energy minima , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Sijia Liu,et al.  DifFUZZY: a fuzzy clustering algorithm for complex datasets , 2010, CI 2010.

[54]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[55]  Lydia E Kavraki,et al.  Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction , 2006, Proc. Natl. Acad. Sci. USA.

[56]  Ronald R. Coifman,et al.  Graph Laplacian Tomography From Unknown Random Projections , 2008, IEEE Transactions on Image Processing.

[57]  Vijay S Pande,et al.  Progress and challenges in the automated construction of Markov state models for full protein systems. , 2009, The Journal of chemical physics.