Analyzing Molecular Dynamics Trajectories Thermodynamically through Artificial Intelligence.

Molecular dynamics simulations produce trajectories that correspond to vast amounts of structure when exploring biochemical processes. Extracting valuable information, e.g., important intermediate states and collective variables (CVs) that describe the major movement modes, from molecular trajectories to understand the underlying mechanisms of biological processes presents a significant challenge. To achieve this goal, we introduce a deep learning approach, coined DIKI (deep identification of key intermediates), to determine low-dimensional CVs distinguishing key intermediate conformations without a-priori assumptions. DIKI dynamically plans the distribution of latent space and groups together similar conformations within the same cluster. Moreover, by incorporating two user-defined parameters, namely, coarse focus knob and fine focus knob, to help identify conformations with low free energy and differentiate the subtle distinctions among these conformations, resolution-tunable clustering was achieved. Furthermore, the integration of DIKI with a path-finding algorithm contributes to the identification of crucial intermediates along the lowest free-energy pathway. We postulate that DIKI is a robust and flexible tool that can find widespread applications in the analysis of complex biochemical processes.

[1]  W. Cai,et al.  Deep-Learning-Assisted Enhanced Sampling for Exploring Molecular Conformational Changes. , 2023, The journal of physical chemistry. B.

[2]  Yaoquan Tu,et al.  Sigmoid Accelerated Molecular Dynamics: An Efficient Enhanced Sampling Method for Biosystems. , 2023, The journal of physical chemistry letters.

[3]  A. Pyle,et al.  PDC: a highly compact file format to store protein 3D coordinates , 2023, Database J. Biol. Databases Curation.

[4]  C. Chipot,et al.  Accurate determination of protein:ligand standard binding free energies from molecular dynamics simulations , 2022, Nature Protocols.

[5]  C. Chipot,et al.  MLCV: Bridging Machine-Learning-Based Dimensionality Reduction and Free-Energy Calculation , 2021, J. Chem. Inf. Model..

[6]  Bernard R. Brooks,et al.  Variational embedding of protein folding simulations using gaussian mixture variational autoencoders , 2021, The Journal of chemical physics.

[7]  Francesco Trozzi,et al.  UMAP as a Dimensionality Reduction Tool for Molecular Dynamics Simulations of Biomacromolecules: A Comparison Study. , 2021, The journal of physical chemistry. B.

[8]  Alex Rodriguez,et al.  Unsupervised Learning Methods for Molecular Simulation Data , 2021, Chemical reviews.

[9]  Gabriel Stoltz,et al.  Chasing Collective Variables using Autoencoders and biased trajectories , 2021, Journal of chemical theory and computation.

[10]  H. Grubmüller,et al.  Time-Lagged Independent Component Analysis of Random Walks and Protein Dynamics , 2021, bioRxiv.

[11]  D. Kobak,et al.  Initialization is critical for preserving global data structure in both t-SNE and UMAP , 2021, Nature Biotechnology.

[12]  C. Chipot,et al.  Repurposing Existing Molecular Machines through Accurate Regulation of Cooperative Motions. , 2020, The journal of physical chemistry letters.

[13]  USA,et al.  State predictive information bottleneck. , 2020, The Journal of chemical physics.

[14]  K. Liedl,et al.  Polarizable and non-polarizable force fields: Protein folding, unfolding, and misfolding. , 2020, The Journal of chemical physics.

[15]  Yi Wang,et al.  Scalable molecular dynamics on CPU and GPU architectures with NAMD. , 2020, The Journal of chemical physics.

[16]  Haohao Fu,et al.  Finding an Optimal Pathway on a Multidimensional Free-Energy Landscape , 2020, J. Chem. Inf. Model..

[17]  Jing Ma,et al.  A Two-ended DAta-Driven Accelerated Sampling Method for Exploring the Transition Pathways between Two Known States of Protein. , 2020, Journal of chemical theory and computation.

[18]  M. Scheffner,et al.  Machine Learning Driven Analysis of Large Scale Simulations Reveals Conformational Characteristics of Ubiquitin Chains. , 2020, Journal of chemical theory and computation.

[19]  C. Chipot,et al.  Taming Rugged Free Energy Landscapes Using an Average Force. , 2019, Accounts of chemical research.

[20]  Jun Zhang,et al.  Deep Representation Learning for Complex Free Energy Landscapes. , 2019, The journal of physical chemistry letters.

[21]  Yi Isaac Yang,et al.  Enhanced sampling in molecular dynamics. , 2019, The Journal of chemical physics.

[22]  Feng Wang,et al.  t-Distributed Stochastic Neighbor Embedding Method with the Least Information Loss for Macromolecular Simulations. , 2018, Journal of chemical theory and computation.

[23]  N. Zabaras,et al.  Predictive Collective Variable Discovery with Deep Bayesian Models , 2018, The Journal of chemical physics.

[24]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[25]  Pratyush Tiwary,et al.  Reweighted autoencoded variational Bayes for enhanced sampling (RAVE). , 2018, The Journal of chemical physics.

[26]  Frank Noé,et al.  Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics , 2017, The Journal of chemical physics.

[27]  Anne Condon,et al.  Interpretable dimensionality reduction of single cell transcriptome data with deep generative models , 2017, Nature Communications.

[28]  Ryan C. Godwin,et al.  Uncovering Large-Scale Conformational Change in Molecular Dynamics without Prior Knowledge. , 2016, Journal of chemical theory and computation.

[29]  B. L. de Groot,et al.  CHARMM36m: an improved force field for folded and intrinsically disordered proteins , 2016, Nature Methods.

[30]  Mohammad M. Sultan,et al.  MSMBuilder: Statistical Models for Biomolecular Dynamics , 2016, bioRxiv.

[31]  Frank Noé,et al.  Variational Koopman models: Slow collective variables and molecular kinetics from short off-equilibrium simulations. , 2016, The Journal of chemical physics.

[32]  Florian Sittel,et al.  Robust Density-Based Clustering To Identify Metastable Conformational States of Proteins. , 2016, Journal of chemical theory and computation.

[33]  C. Simmerling,et al.  ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. , 2015, Journal of chemical theory and computation.

[34]  J. Andrew McCammon,et al.  Gaussian Accelerated Molecular Dynamics: Unconstrained Enhanced Sampling and Free Energy Calculation , 2015, Journal of chemical theory and computation.

[35]  Rafael C. Bernardi,et al.  Enhanced sampling techniques in molecular dynamics simulations of biological systems. , 2015, Biochimica et biophysica acta.

[36]  R. McGibbon,et al.  Variational cross-validation of slow dynamical modes in molecular kinetics. , 2014, The Journal of chemical physics.

[37]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[38]  F. Noé,et al.  Projected and hidden Markov models for calculating kinetics and metastable states of complex molecules. , 2013, The Journal of chemical physics.

[39]  Marcus D. Hanwell,et al.  Avogadro: an advanced semantic chemical editor, visualization, and analysis platform , 2012, Journal of Cheminformatics.

[40]  Steven D. Schwartz,et al.  Toward Identification of the reaction coordinate directly from the transition state ensemble using the kernel PCA method. , 2011, The journal of physical chemistry. B.

[41]  Sotaro Fuchigami,et al.  Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: the case of domain motions. , 2011, The Journal of chemical physics.

[42]  R. Murphy,et al.  Molecular dynamics analysis of the conformations of a beta-hairpin miniprotein. , 2010, The journal of physical chemistry. B.

[43]  Kyle A. Beauchamp,et al.  Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39). , 2010, Journal of the American Chemical Society.

[44]  Pu Liu,et al.  Fast determination of the optimal rotational matrix for macromolecular superpositions , 2009, J. Comput. Chem..

[45]  Shinya Honda,et al.  Crystal structure of a ten-amino acid protein. , 2008, Journal of the American Chemical Society.

[46]  M. Klein,et al.  Large-Scale Molecular Dynamics Simulations of Self-Assembling Systems , 2008, Science.

[47]  Eric Darve,et al.  Adaptive biasing force method for scalar and vector free energy calculations. , 2008, The Journal of chemical physics.

[48]  Martin Karplus,et al.  Minimum free energy pathways and free energy profiles for conformational transitions based on atomistic molecular dynamics simulations. , 2007, The Journal of chemical physics.

[49]  K. Dill,et al.  Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[50]  Thomas B. Woolf,et al.  Multiple pathways in conformational transitions of the alanine dipeptide: An application of dynamic importance sampling , 2006, J. Comput. Chem..

[51]  G. Ciccotti,et al.  String method in collective variables: minimum free energy paths and isocommittor surfaces. , 2006, The Journal of chemical physics.

[52]  Lydia E Kavraki,et al.  Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction , 2006, Proc. Natl. Acad. Sci. USA.

[53]  Vijay S Pande,et al.  Validation of Markov state models using Shannon's entropy. , 2006, The Journal of chemical physics.

[54]  M. Karplus,et al.  Molecular dynamics and protein function. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[55]  P. Nguyen,et al.  Energy landscape of a small peptide revealed by dihedral angle principal component analysis , 2004, Proteins.

[56]  C. Dobson Protein folding and misfolding , 2003, Nature.

[57]  J. Berg,et al.  Molecular dynamics simulations of biomolecules , 2002, Nature Structural Biology.

[58]  A. Laio,et al.  Escaping free-energy minima , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[59]  B. Berne,et al.  The free energy landscape for β hairpin folding in explicit water , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[60]  D. Landau,et al.  Determining the density of states for classical statistical models: a random walk algorithm to produce a flat histogram. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[61]  D. Landau,et al.  Efficient, multiple-range random walk algorithm to calculate the density of states. , 2000, Physical review letters.

[62]  Amedeo Caflisch,et al.  Calculation of conformational transitions and barriers in solvated systems: Application to the alanine dipeptide in water , 1999 .

[63]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[64]  V. Muñoz,et al.  Folding dynamics and mechanism of β-hairpin formation , 1997, Nature.

[65]  B. Brooks,et al.  Constant pressure molecular dynamics simulation: The Langevin piston method , 1995 .

[66]  L. Serrano,et al.  A short linear peptide that folds into a native stable β-hairpin in aqueous solution , 1994, Nature Structural Biology.

[67]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[68]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  G. Torrie,et al.  Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling , 1977 .

[70]  R. Mazo On the theory of brownian motion , 1973 .