Decision functions from supervised machine learning algorithms as collective variables for accelerating molecular simulations

Selection of appropriate collective variables for enhancing molecular simulations remains an unsolved problem in computational biophysics. In particular, picking initial collective variables (CVs) is particularly challenging in higher dimensions. Which atomic coordinates or transforms there of from a list of thousands should one pick for enhanced sampling runs? How does a modeler even begin to pick starting coordinates for investigation? This remains true even in the case of simple two state systems and only increases in difficulty for multi-state systems. In this work, we attempt to solve the initial CV problem using a data-driven approach inspired by supervised machine learning literature. In particular, we show how the decision functions in supervised machine learning (SML) algorithms can be used as initial CVs for accelerated sampling. Using solvated alanine dipeptide and Chignolin mini-protein as our test cases, we illustrate how the distance to the Support Vector Machines decision hyperplane, the output probability estimates from Logistic Regression, and other classifiers may be used to reversibly sample slow structural transitions. We discuss the utility of other SML algorithms that might be useful for identifying CVs for accelerating molecular simulations.

[1]  Michele Parrinello,et al.  A variational conformational dynamics approach to the selection of collective variables in metadynamics. , 2017, The Journal of chemical physics.

[2]  Giovanni Bussi,et al.  Enhanced Sampling in Molecular Dynamics Using Metadynamics, Replica-Exchange, and Temperature-Acceleration , 2013, Entropy.

[3]  Thomas J Lane,et al.  MDTraj: a modern, open library for the analysis of molecular dynamics trajectories , 2014, bioRxiv.

[4]  Mohammad M. Sultan,et al.  Towards simple kinetic models of functional dynamics for a kinase subfamily , 2017, bioRxiv.

[5]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[6]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[7]  Francesco L Gervasio,et al.  The different flexibility of c-Src and c-Abl kinases regulates the accessibility of a druggable inactive conformation. , 2012, Journal of the American Chemical Society.

[8]  Shinya Honda,et al.  Crystal structure of a ten-amino acid protein. , 2008, Journal of the American Chemical Society.

[9]  Frank Noé,et al.  An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation , 2014, Advances in Experimental Medicine and Biology.

[10]  Toni Giorgino,et al.  Identification of slow molecular order parameters for Markov model construction. , 2013, The Journal of chemical physics.

[11]  Vijay S. Pande,et al.  MSMExplorer: Data Visualizations for Biomolecular Dynamics , 2017, J. Open Source Softw..

[12]  Vijay S. Pande,et al.  Landmark Kernel tICA for Conformational Dynamics , 2017, bioRxiv.

[13]  Wei Chen,et al.  Molecular enhanced sampling with autoencoders: On‐the‐fly collective variable discovery and accelerated free energy landscape exploration , 2017, J. Comput. Chem..

[14]  A. Laio,et al.  Metadynamics: a method to simulate rare events and reconstruct the free energy in biophysics, chemistry and material science , 2008 .

[15]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[16]  Joseph A. Bank,et al.  Supporting Online Material Materials and Methods Figs. S1 to S10 Table S1 References Movies S1 to S3 Atomic-level Characterization of the Structural Dynamics of Proteins , 2022 .

[17]  B. Roux,et al.  Computational Study of the “DFG-Flip” Conformational Transition in c-Abl and c-Src Tyrosine Kinases , 2014, The journal of physical chemistry. B.

[18]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[19]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[20]  Sunhwan Jo,et al.  Leveraging the Information from Markov State Models To Improve the Convergence of Umbrella Sampling Simulations. , 2016, The journal of physical chemistry. B.

[21]  Mohammad M. Sultan,et al.  MSMBuilder: Statistical Models for Biomolecular Dynamics , 2016, bioRxiv.

[22]  Aaron R Dinner,et al.  Automatic method for identifying reaction coordinates in complex systems. , 2005, The journal of physical chemistry. B.

[23]  M. Parrinello,et al.  Well-tempered metadynamics: a smoothly converging and tunable free-energy method. , 2008, Physical review letters.

[24]  Vijay S. Pande,et al.  Transfer Learning from Markov Models Leads to Efficient Sampling of Related Systems. , 2017, The journal of physical chemistry. B.

[25]  Mohammad M. Sultan,et al.  Variational encoding of complex dynamics. , 2017, Physical review. E.

[26]  Diwakar Shukla,et al.  Automatic Selection of Order Parameters in the Analysis of Large Scale Molecular Dynamics Simulations , 2014, Journal of chemical theory and computation.

[27]  Benjamin Trendelkamp-Schroer,et al.  Efficient estimation of rare-event kinetics , 2014, 1409.6439.

[28]  J. P. Grossman,et al.  Biomolecular simulation: a computational microscope for molecular biology. , 2012, Annual review of biophysics.

[29]  Albert C. Pan,et al.  Finding transition pathways using the string method with swarms of trajectories. , 2008, The journal of physical chemistry. B.

[30]  Dahlia R. Weiss,et al.  Millisecond dynamics of RNA polymerase II translocation at atomic resolution , 2014, Proceedings of the National Academy of Sciences.

[31]  A. Laio,et al.  Efficient reconstruction of complex free energy landscapes by multiple walkers metadynamics. , 2006, The journal of physical chemistry. B.

[32]  Vijay S. Pande,et al.  OpenMM 7: Rapid development of high performance algorithms for molecular dynamics , 2016, bioRxiv.

[33]  V. Pande,et al.  Modeling the mechanism of CLN025 beta-hairpin formation. , 2017, The Journal of chemical physics.

[34]  Michele Parrinello,et al.  Demonstrating the Transferability and the Descriptive Power of Sketch-Map. , 2013, Journal of chemical theory and computation.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[38]  C. Clementi,et al.  Discovering mountain passes via torchlight: methods for the definition of reaction coordinates and pathways in complex macromolecular reactions. , 2013, Annual review of physical chemistry.

[39]  R. Dror,et al.  Improved side-chain torsion potentials for the Amber ff99SB protein force field , 2010, Proteins.

[40]  B. Berne,et al.  Spectral gap optimization of order parameters for sampling complex molecular systems , 2015, Proceedings of the National Academy of Sciences.

[41]  © Tan,et al.  Data Mining Classification : Basic Concepts , Decision Trees , and Model Evaluation , 2004 .

[42]  A. Laio,et al.  A bias-exchange approach to protein folding. , 2007, The journal of physical chemistry. B.

[43]  Massimiliano Bonomi,et al.  PLUMED 2: New feathers for an old bird , 2013, Comput. Phys. Commun..

[44]  Eric Vanden-Eijnden,et al.  Simplified and improved string method for computing the minimum energy paths in barrier-crossing events. , 2007, The Journal of chemical physics.

[45]  E. Vanden-Eijnden,et al.  String method for the study of rare events , 2002, cond-mat/0205527.

[46]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[47]  M. Parrinello,et al.  A time-independent free energy estimator for metadynamics. , 2015, The journal of physical chemistry. B.

[48]  Vijay S. Pande,et al.  Everything you wanted to know about Markov State Models but were afraid to ask. , 2010, Methods.

[49]  Mohammad M. Sultan,et al.  Transferable Neural Networks for Enhanced Sampling of Protein Dynamics. , 2018, Journal of chemical theory and computation.