Deep Generative Model Driven Protein Folding Simulations

Significant progress in computer hardware and software have enabled molecular dynamics (MD) simulations to model complex biological phenomena such as protein folding. However, enabling MD simulations to access biologically relevant timescales (e.g., beyond milliseconds) still remains challenging. These limitations include (1) quantifying which set of states have already been (sufficiently) sampled in an ensemble of MD runs, and (2) identifying novel states from which simulations can be initiated to sample rare events (e.g., sampling folding events). With the recent success of deep learning and artificial intelligence techniques in analyzing large datasets, we posit that these techniques can also be used to adaptively guide MD simulations to model such complex biological phenomena. Leveraging our recently developed unsupervised deep learning technique to cluster protein folding trajectories into partially folded intermediates, we build an iterative workflow that enables our generative model to be coupled with all-atom MD simulations to fold small protein systems on emerging high performance computing platforms. We demonstrate our approach in folding Fs-peptide and the $\beta\beta\alpha$ (BBA) fold, FSD-EY. Our adaptive workflow enables us to achieve an overall root-mean squared deviation (RMSD) to the native state of 1.6$~A$ and 4.4~$A$ respectively for Fs-peptide and FSD-EY. We also highlight some emerging challenges in the context of designing scalable workflows when data intensive deep learning techniques are coupled to compute intensive MD simulations.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Oliver Beckstein,et al.  MDAnalysis: A toolkit for the analysis of molecular dynamics simulations , 2011, J. Comput. Chem..

[3]  V. Pande,et al.  Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics. , 2007, The Journal of chemical physics.

[4]  Gianni De Fabritiis,et al.  Dimensionality reduction methods for molecular simulations , 2017, ArXiv.

[5]  Shantenu Jha,et al.  Implementing Adaptive Ensemble Biomolecular Applications at Scale , 2018, ArXiv.

[6]  Klaus Schulten,et al.  Discovery through the computational microscope. , 2009, Structure.

[7]  Diwakar Shukla,et al.  Reinforcement Learning Based Adaptive Sampling: REAPing Rewards by Exploring Protein Conformational Landscapes. , 2017, The journal of physical chemistry. B.

[8]  J. P. Grossman,et al.  Biomolecular simulation: a computational microscope for molecular biology. , 2012, Annual review of biophysics.

[9]  Shantenu Jha,et al.  Middleware Building Blocks for Workflow Systems , 2019, Computing in Science & Engineering.

[10]  Arvind Ramanathan,et al.  QAARM: quasi-anharmonic autoregressive model reveals molecular recognition pathways in ubiquitin , 2011, Bioinform..

[11]  Vijay S Pande,et al.  Progress and challenges in the automated construction of Markov state models for full protein systems. , 2009, The Journal of chemical physics.

[12]  Oliver Beckstein,et al.  MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations , 2016, SciPy.

[13]  T. McGibbonRobert Fs MD Trajectories , 2014 .

[14]  Arvind Ramanathan,et al.  Mechanism of glucocerebrosidase activation and dysfunction in Gaucher disease unraveled by molecular dynamics and deep learning , 2019, Proceedings of the National Academy of Sciences.

[15]  S. L. Mayo,et al.  The ββα fold: explorations in sequence space11Edited by M. F. Summers , 2001 .

[16]  Shang Gao,et al.  Deep clustering of protein folding simulations , 2018, BMC Bioinformatics.

[17]  Jörg Gsponer,et al.  Molecular dynamics simulations of protein folding from the transition state , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Shantenu Jha,et al.  Extensible and Scalable Adaptive Sampling on Supercomputers. , 2020, Journal of chemical theory and computation.

[19]  Jeffrey K Weber,et al.  Characterization and rapid sampling of protein folding Markov state model topologies. , 2011, Journal of chemical theory and computation.

[20]  Michael R. Shirts,et al.  Mathematical analysis of coupled parallel simulations. , 2001, Physical review letters.

[21]  Carl Doersch,et al.  Tutorial on Variational Autoencoders , 2016, ArXiv.

[22]  R. Dror,et al.  Systematic Validation of Protein Force Fields against Experimental Data , 2012, PloS one.

[23]  R. Dror,et al.  How Fast-Folding Proteins Fold , 2011, Science.

[24]  D. Case,et al.  Exploring protein native states and large‐scale conformational changes with a modified generalized born model , 2004, Proteins.

[25]  Diwakar Shukla,et al.  Recruiting machine learning methods for molecular simulations of proteins , 2018 .

[26]  Geoffrey C. Fox,et al.  Learning Everywhere: Pervasive Machine Learning for Effective High-Performance Computation , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[27]  Vijay S. Pande,et al.  OpenMM 7: Rapid development of high performance algorithms for molecular dynamics , 2016, bioRxiv.