Runtime Steering of Molecular Dynamics Simulations Through In Situ Analysis and Annotation of Collective Variables

This paper targets one of the most common simulations on petascale and, very likely, on exascale machines: molecular dynamics (MD) simulations studying the (classical) time evolution of a molecular system at atomic resolution. Specifically, this work addresses the data challenges of MD simulations at exascale through (1) the creation of a data analysis method based on a suite of advanced collective variables (CVs) selected for annotation of structural molecular properties and capturing rare conformational events at runtime, (2) the definition of an in situ framework to automatically identify the frames where the rare events occur during an MD simulation and (3) the integration of both method and framework into two MD workflows for the study of early termination or termination and restart of a benchmark molecular system for protein folding ---the Fs peptide system (Ace-A_5(AAARA)_3A-NME)--- using Summit. The approach achieves faster exploration of the conformational space compared to extensive ensemble simulations. Specifically, our in situ framework with early termination alone achieves 99.6% coverage of the reference conformational space for the Fs peptide with just 60% of the MD steps otherwise used for a traditional execution of the MD simulation. Annotation-based restart allows us to cover 94.6% of the conformational space, just running 50% of the overall MD steps.

[1]  Michael R. Shirts,et al.  Enhanced Sampling Methods for Molecular Dynamics Simulations [Article v1.0] , 2022, Living Journal of Computational Molecular Science.

[2]  Trilce Estrada,et al.  Characterizing In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-Generation Supercomputers , 2019, 2019 15th International Conference on eScience (eScience).

[3]  Gareth A. Tribello,et al.  Analyzing and Biasing Simulations with PLUMED. , 2018, Methods in molecular biology.

[4]  Nathan A Bernhardt Enhanced Sampling Methods for Molecular Dynamics Simulations of Proteins , 2018 .

[5]  Gregory R Bowman,et al.  Choice of Adaptive Sampling Strategy Impacts State Discovery, Transition Probabilities, and the Apparent Mechanism of Conformational Changes. , 2018, Journal of chemical theory and computation.

[6]  Shantenu Jha,et al.  Adaptive Ensemble Biomolecular Applications at Scale , 2018, SN Computer Science.

[7]  Yan Li,et al.  Effect of Clustering Algorithm on Establishing Markov State Model for Molecular Dynamics Simulations , 2016, J. Chem. Inf. Model..

[8]  Dong H. Ahn,et al.  Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters , 2016, HPDC.

[9]  John D Chodera,et al.  A Simple Method for Automated Equilibration Detection in Molecular Simulations. , 2016, Journal of chemical theory and computation.

[10]  Frank Noé,et al.  HTMD: High-Throughput Molecular Dynamics for Molecular Discovery. , 2016, Journal of chemical theory and computation.

[11]  Berk Hess,et al.  GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .

[12]  Marianne Winslett,et al.  A Multiplatform Study of I/O Behavior on Petascale Supercomputers , 2015, HPDC.

[13]  Grant M. Rotskoff,et al.  Molecular simulation workflows as parallel algorithms: the execution engine of Copernicus, a distributed high-performance computing platform. , 2015, Journal of chemical theory and computation.

[14]  Rafael C. Bernardi,et al.  Molecular dynamics simulations of large macromolecular complexes. , 2015, Current opinion in structural biology.

[15]  Joshua L Adelman,et al.  WESTPA: an interoperable, highly scalable software package for weighted ensemble simulation and analysis. , 2015, Journal of chemical theory and computation.

[16]  Yuni Xia,et al.  DCMS: A data analytics and management system for molecular simulation , 2014, Journal of Big Data.

[17]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[18]  Satoshi Matsuoka,et al.  A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[19]  Lei Huang,et al.  Generalized scalable multiple copy algorithms for molecular dynamics simulations in NAMD , 2014, Comput. Phys. Commun..

[20]  Xiaocheng Zou,et al.  Scalable in situ scientific data encoding for analytical query processing , 2013, HPDC.

[21]  Giacomo Fiorin,et al.  Using collective variables to drive molecular dynamics simulations , 2013 .

[22]  Toni Giorgino,et al.  Identification of slow molecular order parameters for Markov model construction. , 2013, The Journal of chemical physics.

[23]  Fan Zhang,et al.  Combining in-situ and in-transit processing to enable extreme-scale scientific analysis , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[24]  Scott Klasky,et al.  DataSpaces: an interaction and coordination framework for coupled simulation workflows , 2012, HPDC '10.

[25]  Robert B. Ross,et al.  On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[26]  Shawn D. Newsam,et al.  Validating clustering of molecular dynamics simulations using polymer models , 2011, BMC Bioinformatics.

[27]  Michele Parrinello,et al.  Simplifying the representation of complex free-energy landscapes using sketch-map , 2011, Proceedings of the National Academy of Sciences.

[28]  M. Maggioni,et al.  Determination of reaction coordinates via locally scaled diffusion map. , 2011, The Journal of chemical physics.

[29]  Arvind Ramanathan,et al.  On-the-Fly Identification of Conformational Substates from Molecular Dynamics Simulations. , 2011, Journal of chemical theory and computation.

[30]  Sotaro Fuchigami,et al.  Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: the case of domain motions. , 2011, The Journal of chemical physics.

[31]  Xuhui Huang,et al.  Using generalized ensemble simulations and Markov state models to identify conformational states. , 2009, Methods.

[32]  Terence Tao,et al.  Poincare's Legacies, Part II: pages from year two of a mathematical blog , 2009 .

[33]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[34]  Jianyin Shao,et al.  Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms. , 2007, Journal of chemical theory and computation.

[35]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[36]  Holger Gohlke,et al.  The Amber biomolecular simulation programs , 2005, J. Comput. Chem..

[37]  David P. Anderson,et al.  Homogeneous redundancy: a technique to ensure integrity of molecular simulation results using public computing , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[38]  P. Deuflhard,et al.  Robust Perron cluster analysis in conformation dynamics , 2005 .

[39]  Wei Zhang,et al.  A point‐charge force field for molecular mechanics simulations of proteins based on condensed‐phase quantum mechanical calculations , 2003, J. Comput. Chem..

[40]  S Gnanakaran,et al.  Peptide folding simulations. , 2003, Current opinion in structural biology.

[41]  Hans-Christian Hege,et al.  Visualizing and identifying conformational ensembles in molecular dynamics trajectories , 2002, Comput. Sci. Eng..

[42]  B. Montgomery Pettitt,et al.  Large scale distributed data repository: design of a molecular dynamics trajectory database , 1999, Future Gener. Comput. Syst..

[43]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[44]  H. Berendsen,et al.  Essential dynamics of proteins , 1993, Proteins.

[45]  Rafael Ferreira da Silva,et al.  Performance assessment of ensembles of in situ workflows under resource constraints , 2023, Concurrency and Computation.

[46]  Adam Liwo,et al.  In situ data analytics and indexing of protein trajectories , 2017, J. Comput. Chem..

[47]  A. Zięba,et al.  Standard Deviation of the Mean of Autocorrelated Observations Estimated with the Use of the Autocorrelation Function Estimated From the Data , 2011 .

[48]  Norman W. Paton,et al.  Dataspaces , 2009, SeCO Workshop.

[49]  Ken A Dill,et al.  Use of the Weighted Histogram Analysis Method for the Analysis of Simulated and Parallel Tempering Simulations. , 2007, Journal of chemical theory and computation.