Collective variable discovery and enhanced sampling using autoencoders: Innovations in network architecture and error function design.

Auto-associative neural networks ("autoencoders") present a powerful nonlinear dimensionality reduction technique to mine data-driven collective variables from molecular simulation trajectories. This technique furnishes explicit and differentiable expressions for the nonlinear collective variables, making it ideally suited for integration with enhanced sampling techniques for accelerated exploration of configurational space. In this work, we describe a number of sophistications of the neural network architectures to improve and generalize the process of interleaved collective variable discovery and enhanced sampling. We employ circular network nodes to accommodate periodicities in the collective variables, hierarchical network architectures to rank-order the collective variables, and generalized encoder-decoder architectures to support bespoke error functions for network training to incorporate prior knowledge. We demonstrate our approach in blind collective variable discovery and enhanced sampling of the configurational free energy landscapes of alanine dipeptide and Trp-cage using an open-source plugin developed for the OpenMM molecular simulation package.

[1]  Eric Vanden-Eijnden,et al.  On-the-fly free energy parameterization via temperature accelerated molecular dynamics. , 2012, Chemical physics letters.

[2]  Matthias Scholz,et al.  Nonlinear Principal Component Analysis: Neural Network Models and Applications , 2008 .

[3]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[4]  Frank Noé,et al.  A Variational Approach to Modeling Slow Processes in Stochastic Dynamical Systems , 2012, Multiscale Model. Simul..

[5]  D. Case,et al.  Exploring protein native states and large‐scale conformational changes with a modified generalized born model , 2004, Proteins.

[6]  Cecilia Clementi,et al.  Rapid exploration of configuration space with diffusion-map-directed molecular dynamics. , 2013, The journal of physical chemistry. B.

[7]  Francesco Luigi Gervasio,et al.  From A to B in free energy space. , 2007, The Journal of chemical physics.

[8]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[9]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[10]  G. P. King,et al.  Extracting qualitative dynamics from experimental data , 1986 .

[11]  Andrew L. Ferguson,et al.  Rational design of patchy colloids via landscape engineering , 2018 .

[12]  Andrew L. Ferguson,et al.  An experimental and computational investigation of spontaneous lasso formation in microcin J25. , 2010, Biophysical journal.

[13]  David Chandler,et al.  Quantifying Density Fluctuations in Volumes of All Shapes and Sizes Using Indirect Umbrella Sampling , 2011, Journal of statistical physics.

[14]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[15]  J. W. Neidigh,et al.  Designing a 20-residue protein , 2002, Nature Structural Biology.

[16]  F. Noé,et al.  Commute Maps: Separating Slowly Mixing Molecular Configurations for Kinetic Modeling. , 2016, Journal of chemical theory and computation.

[17]  R. Swendsen,et al.  THE weighted histogram analysis method for free‐energy calculations on biomolecules. I. The method , 1992 .

[18]  R. Miranda,et al.  Circular Nodes in Neural Networks , 1996, Neural Computation.

[19]  H. Edelsbrunner Surface Reconstruction by Wrapping Finite Sets in Space , 2003 .

[20]  Vojtěch Spiwok,et al.  Metadynamics in the conformational space nonlinearly dimensionally reduced by Isomap. , 2011, The Journal of chemical physics.

[21]  A. Garcia,et al.  Computing the stability diagram of the Trp-cage miniprotein , 2008, Proceedings of the National Academy of Sciences.

[22]  J. Preto,et al.  Fast recovery of free energy landscapes via diffusion-map-directed molecular dynamics. , 2014, Physical chemistry chemical physics : PCCP.

[23]  Ioannis G Kevrekidis,et al.  Intrinsic map dynamics exploration for uncharted effective free-energy landscapes , 2016, Proceedings of the National Academy of Sciences.

[24]  M. Tuckerman,et al.  On the use of the adiabatic molecular dynamics technique in the calculation of free energy profiles , 2002 .

[25]  M. Tuckerman,et al.  Efficient and direct generation of multidimensional free energy surfaces via adiabatic dynamics without coordinate transformations. , 2008, The journal of physical chemistry. B.

[26]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[27]  M. Parrinello,et al.  Well-tempered metadynamics: a smoothly converging and tunable free-energy method. , 2008, Physical review letters.

[28]  P. Nguyen,et al.  Energy landscape of a small peptide revealed by dihedral angle principal component analysis , 2004, Proteins.

[29]  Herbert Edelsbrunner,et al.  Three-dimensional alpha shapes , 1994, ACM Trans. Graph..

[30]  Vijay S. Pande,et al.  OpenMM 7: Rapid development of high performance algorithms for molecular dynamics , 2016, bioRxiv.

[31]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[32]  A. Laio,et al.  Escaping free-energy minima , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[33]  P. Bolhuis,et al.  Sampling the multiple folding mechanisms of Trp-cage in explicit solvent , 2006, Proceedings of the National Academy of Sciences.

[34]  Vijay S Pande,et al.  Improvements in Markov State Model Construction Reveal Many Non-Native Interactions in the Folding of NTL9. , 2013, Journal of chemical theory and computation.

[35]  Marino Arroyo,et al.  Topological obstructions in the way of data-driven collective variables. , 2015, The Journal of chemical physics.

[36]  Vijay S. Pande,et al.  Modeling Molecular Kinetics with tICA and the Kernel Trick , 2015, Journal of chemical theory and computation.

[37]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[38]  Hong Chen,et al.  Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems , 1995, IEEE Trans. Neural Networks.

[39]  Hao Wu,et al.  VAMPnets for deep learning of molecular kinetics , 2017, Nature Communications.

[40]  Lydia E Kavraki,et al.  Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction , 2006, Proc. Natl. Acad. Sci. USA.

[41]  Diwakar Shukla,et al.  OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation. , 2013, Journal of chemical theory and computation.

[42]  R. Hegger,et al.  Dihedral angle principal component analysis of molecular dynamics simulations. , 2007, The Journal of chemical physics.

[43]  Junmei Wang,et al.  Development and testing of a general amber force field , 2004, J. Comput. Chem..

[44]  G. Torrie,et al.  Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling , 1977 .

[45]  Ioannis G. Kevrekidis,et al.  Nonlinear dimensionality reduction in molecular simulation: The diffusion map approach , 2011 .

[46]  Toni Giorgino,et al.  Identification of slow molecular order parameters for Markov model construction. , 2013, The Journal of chemical physics.

[47]  F. Noé,et al.  Kinetic distance and kinetic maps from molecular dynamics simulation. , 2015, Journal of chemical theory and computation.

[48]  Ioannis G Kevrekidis,et al.  Integrating diffusion maps with umbrella sampling: application to alanine dipeptide. , 2011, The Journal of chemical physics.

[49]  Rafael C. Bernardi,et al.  Enhanced sampling techniques in molecular dynamics simulations of biological systems. , 2015, Biochimica et biophysica acta.

[50]  P E Bourne,et al.  The Protein Data Bank. , 2002, Nucleic acids research.

[51]  Andrew L. Ferguson,et al.  Machine learning and data science in soft materials engineering , 2018, Journal of physics. Condensed matter : an Institute of Physics journal.

[52]  C. Dellago,et al.  Reaction coordinates of biomolecular isomerization. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Frank Noé,et al.  Hierarchical Time-Lagged Independent Component Analysis: Computing Slow Modes and Reaction Coordinates for Large Molecular Systems. , 2016, Journal of chemical theory and computation.

[54]  A. Voter Hyperdynamics: Accelerated Molecular Dynamics of Infrequent Events , 1997 .

[55]  Gerhard Stock,et al.  How complex is the dynamics of Peptide folding? , 2007, Physical review letters.

[56]  Ronald M Levy,et al.  How kinetics within the unfolded state affects protein folding: an analysis based on markov state models and an ultra-long MD trajectory. , 2013, The journal of physical chemistry. B.

[57]  M. Maggioni,et al.  Determination of reaction coordinates via locally scaled diffusion map. , 2011, The Journal of chemical physics.

[58]  M. Karplus,et al.  Molecular dynamics simulations in biology , 1990, Nature.

[59]  F E Cohen,et al.  Protein conformational landscapes: Energy minimization and clustering of a long molecular dynamics trajectory , 1995, Proteins.

[60]  Andrew L. Ferguson,et al.  BayesWHAM: A Bayesian approach for free energy estimation, reweighting, and uncertainty quantification in the weighted histogram analysis method , 2017, J. Comput. Chem..

[61]  Garegin A Papoian,et al.  Deconstructing the native state: energy landscapes, function, and dynamics of globular proteins. , 2009, The journal of physical chemistry. B.

[62]  E. Vanden-Eijnden,et al.  A temperature accelerated method for sampling free energy and determining reaction pathways in rare events simulations , 2006 .

[63]  Vijay S. Pande,et al.  Accelerating molecular dynamic simulation on graphics processing units , 2009, J. Comput. Chem..

[64]  Andrew E. Torda,et al.  Local elevation: A method for improving the searching properties of molecular dynamics simulation , 1994, J. Comput. Aided Mol. Des..

[65]  Giovanni Bussi,et al.  Enhanced Sampling in Molecular Dynamics Using Metadynamics, Replica-Exchange, and Temperature-Acceleration , 2013, Entropy.

[66]  H. Berendsen,et al.  Essential dynamics of proteins , 1993, Proteins.

[67]  Eric Darve,et al.  Adaptive biasing force method for scalar and vector free energy calculations. , 2008, The Journal of chemical physics.

[68]  Carmeline J. Dsilva,et al.  Systematic characterization of protein folding pathways using diffusion maps: application to Trp-cage miniprotein. , 2015, The Journal of chemical physics.

[69]  M. Karplus,et al.  Collective motions in proteins: A covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations , 1991, Proteins.

[70]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[71]  David G. Kirkpatrick,et al.  On the shape of a set of points in the plane , 1983, IEEE Trans. Inf. Theory.

[72]  Andrew L. Ferguson,et al.  Systematic determination of order parameters for chain dynamics using diffusion maps , 2010, Proceedings of the National Academy of Sciences.

[73]  G. Stock,et al.  Principal component analysis of molecular dynamics: on the use of Cartesian vs. internal coordinates. , 2014, The Journal of chemical physics.

[74]  Imre G. Csizmadia,et al.  Variation of conformational properties at a glance. True graphical visualization of the Ramachandran surface topology as a periodic potential energy surface , 2012 .

[75]  Marino Arroyo,et al.  Modeling and enhanced sampling of molecular systems with smooth and nonlinear data-driven collective variables. , 2013, The Journal of chemical physics.

[76]  García,et al.  Large-amplitude nonlinear motions in proteins. , 1992, Physical review letters.

[77]  H. C. Andersen Molecular dynamics simulations at constant pressure and/or temperature , 1980 .

[78]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[79]  C. Clementi,et al.  Discovering mountain passes via torchlight: methods for the definition of reaction coordinates and pathways in complex macromolecular reactions. , 2013, Annual review of physical chemistry.

[80]  Valentina Tozzini,et al.  Coarse-grained models for proteins. , 2005, Current opinion in structural biology.

[81]  Frank Noé,et al.  Variational Approach to Molecular Kinetics. , 2014, Journal of chemical theory and computation.

[82]  Hernan F. Stamati,et al.  Application of nonlinear dimensionality reduction to characterize the conformational landscape of small peptides , 2010, Proteins.