Stream-AI-MD: streaming AI-driven adaptive molecular simulations for heterogeneous computing platforms

Emerging hardware tailored for artificial intelligence (AI) and machine learning (ML) methods provide novel means to couple them with traditional high performance computing (HPC) workflows involving molecular dynamics (MD) simulations. We propose Stream-AI-MD, a novel instance of applying deep learning methods to drive adaptive MD simulation campaigns in a streaming manner. We leverage the ability to run ensemble MD simulations on GPU clusters, while the data from atomistic MD simulations are streamed continuously to AI/ML approaches to guide the conformational search in a biophysically meaningful manner on a wafer-scale AI accelerator. We demonstrate the efficacy of Stream-AI-MD simulations for two scientific use-cases: (1) folding a small prototypical protein, namely ββα-fold (BBA) FSD-EY and (2) understanding protein-protein interaction (PPI) within the SARS-CoV-2 proteome between two proteins, nsp16 and nsp10. We show that Stream-AI-MD simulations can improve time-to-solution by ~50X for BBA protein folding. Further, we also discuss performance trade-offs involved in implementing AI-coupled HPC workflows on heterogeneous computing architectures.

[1]  Arvind Ramanathan,et al.  Mechanism of glucocerebrosidase activation and dysfunction in Gaucher disease unraveled by molecular dynamics and deep learning , 2019, Proceedings of the National Academy of Sciences.

[2]  Shantenu Jha,et al.  AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics , 2021, Int. J. High Perform. Comput. Appl..

[3]  V. Pande,et al.  Markov State Models: From an Art to a Science. , 2018, Journal of the American Chemical Society.

[4]  R. Dror,et al.  Improved side-chain torsion potentials for the Amber ff99SB protein force field , 2010, Proteins.

[5]  Michele Parrinello,et al.  Neural networks-based variationally enhanced sampling , 2019, Proceedings of the National Academy of Sciences.

[6]  Yihang Wang,et al.  Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics , 2019, Nature Communications.

[7]  Frank Noé,et al.  Coarse graining molecular dynamics with graph neural networks. , 2020, The Journal of chemical physics.

[8]  Matteo T Degiacomi,et al.  Coupling Molecular Dynamics and Deep Learning to Mine Protein Conformational Space. , 2019, Structure.

[9]  Adam Liwo,et al.  In situ data analytics and indexing of protein trajectories , 2017, J. Comput. Chem..

[10]  Andrej J. Savol,et al.  Event detection and sub‐state discovery from biomolecular simulations using higher‐order statistics: Application to enzyme adenylate kinase , 2012, Proteins.

[11]  Diwakar Shukla,et al.  Reinforcement Learning Based Adaptive Sampling: REAPing Rewards by Exploring Protein Conformational Landscapes. , 2017, The journal of physical chemistry. B.

[12]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[13]  Venkatram Vishwanath,et al.  Balsam: Near Real-Time Experimental Data Analysis on Supercomputers , 2019, 2019 IEEE/ACM 1st Annual Workshop on Large-scale Experiment-in-the-Loop Computing (XLOOP).

[14]  Hao Wu,et al.  Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning , 2018, Science.

[15]  Qing Wu,et al.  Efficient and self-adaptive in-situ learning in multilayer memristor neural networks , 2018, Nature Communications.

[16]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[17]  Pratyush Tiwary,et al.  Reweighted autoencoded variational Bayes for enhanced sampling (RAVE). , 2018, The Journal of chemical physics.

[18]  A. Kolomeisky,et al.  The Role of Intrinsically Disordered Regions in Acceleration of Protein-Protein Association. , 2019, The journal of physical chemistry. B.

[19]  D. Case,et al.  Exploring protein native states and large‐scale conformational changes with a modified generalized born model , 2004, Proteins.

[20]  Frank Noé,et al.  Machine Learning for Molecular Dynamics on Long Timescales , 2018, Machine Learning Meets Quantum Physics.

[21]  Carl Doersch,et al.  Tutorial on Variational Autoencoders , 2016, ArXiv.

[22]  Matteo Turilli,et al.  DeepDriveMD: Deep-Learning Driven Adaptive Molecular Simulations for Protein Folding , 2019, 2019 IEEE/ACM Third Workshop on Deep Learning on Supercomputers (DLS).

[23]  Vijay S. Pande,et al.  OpenMM 7: Rapid development of high performance algorithms for molecular dynamics , 2016, bioRxiv.

[24]  F. Noé,et al.  Machine learning for protein folding and dynamics. , 2019, Current opinion in structural biology.

[25]  S. L. Mayo,et al.  The ββα fold: explorations in sequence space11Edited by M. F. Summers , 2001 .

[26]  A. Elcock,et al.  Computer Simulation of Protein−Protein Interactions , 2001 .

[27]  Hao Wu,et al.  VAMPnets for deep learning of molecular kinetics , 2017, Nature Communications.

[28]  Albert C. Pan,et al.  Atomic-level characterization of protein–protein association , 2018, Proceedings of the National Academy of Sciences.

[29]  Esam T Abualrous,et al.  Protein-peptide association kinetics beyond the seconds timescale from atomistic simulations , 2017, Nature Communications.

[30]  Mohammad M. Sultan,et al.  Variational encoding of complex dynamics. , 2017, Physical review. E.

[31]  P. Bates,et al.  Modeling protein association mechanisms and kinetics. , 2013, Current opinion in structural biology.

[32]  Lei Huang,et al.  AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics , 2020, bioRxiv.

[33]  Shantenu Jha,et al.  Adaptive ensemble simulations of biomolecules. , 2018, Current opinion in structural biology.

[34]  Shang Gao,et al.  Deep clustering of protein folding simulations , 2018, BMC Bioinformatics.

[35]  S. Chennubhotla,et al.  Transient Unfolding and Long-Range Interactions in Viral BCL2 M11 Enable Binding to the BECN1 BH3 Domain , 2020, Biomolecules.

[36]  Stefano Piana,et al.  Automated Event Detection and Activity Monitoring in Long Molecular Dynamics Simulations. , 2009, Journal of chemical theory and computation.

[37]  Koji Tsuda,et al.  evERdock BAI: Machine-learning-guided selection of protein-protein complex structure. , 2019, The Journal of chemical physics.

[38]  Oliver Beckstein,et al.  MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations , 2016, SciPy.

[39]  Trilce Estrada,et al.  Characterizing In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-Generation Supercomputers , 2019, 2019 15th International Conference on eScience (eScience).

[40]  Arvind Ramanathan,et al.  Protein conformational populations and functionally relevant substates. , 2014, Accounts of chemical research.

[41]  Frank Noé,et al.  Targeted Adversarial Learning Optimized Sampling. , 2019, The journal of physical chemistry letters.

[42]  M. Degiacomi Coupling Molecular Dynamics and Deep Learning to Mine Protein Conformational Space , 2019, Structure.

[43]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[44]  R. Dror,et al.  How Fast-Folding Proteins Fold , 2011, Science.

[45]  Yinghao Wu,et al.  Using Coarse-Grained Simulations to Characterize the Mechanisms of Protein–Protein Association , 2020, Biomolecules.

[46]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[47]  Yutaka Akiyama,et al.  High-performance cloud computing for exhaustive protein-protein docking , 2020, ArXiv.

[48]  J. P. Grossman,et al.  Anton 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[49]  G. Minasov,et al.  High-resolution structures of the SARS-CoV-2 2’-O-methyltransferase reveal strategies for structure-based inhibitor design , 2020, Science Signaling.

[50]  Oliver Beckstein,et al.  MDAnalysis: A toolkit for the analysis of molecular dynamics simulations , 2011, J. Comput. Chem..

[51]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[52]  Jian Yin,et al.  Lessons learned from comparing molecular dynamics engines on the SAMPL5 dataset , 2016, bioRxiv.

[53]  Maral Mesmakhosroshahi,et al.  Training Large Neural Networks with Constant Memory using a New Execution Algorithm , 2020, ArXiv.