Accelerated estimation of long-timescale kinetics from weighted ensemble simulation via non-Markovian "microbin" analysis.

The weighted ensemble (WE) simulation strategy provides unbiased sampling of non-equilibrium processes, such as molecular folding or binding, but the extraction of rate constants relies on characterizing steady state behavior. Unfortunately, WE simulations of sufficiently complex systems will not relax to steady state on observed simulation times. Here we show that a post-simulation clustering of molecular configurations into "microbins" using methods developed in the Markov State Model (MSM) community, can yield unbiased kinetics from WE data before steady-state convergence of the WE simulation itself. Because WE trajectories are directional and not equilibrium-distributed, the history-augmented MSM (haMSM) formulation can be used, which yields the mean first-passage time (MFPT) without bias for arbitrarily small lag times. Accurate kinetics can be obtained while bypassing the often prohibitive convergence requirements of the non-equilibrium weighted ensemble. We validate the method in a simple diffusive process on a 2D random energy landscape, and then analyze atomistic protein folding simulations using WE molecular dynamics. We report significant progress towards the unbiased estimation of protein folding times and pathways, though key challenges remain.

[1]  Frank Noé,et al.  PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models. , 2015, Journal of chemical theory and computation.

[2]  Kyle A. Beauchamp,et al.  Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39). , 2010, Journal of the American Chemical Society.

[3]  D. Zuckerman,et al.  Computational Estimation of Microsecond to Second Atomistic Folding Times. , 2019, Journal of the American Chemical Society.

[4]  H. Risken Fokker-Planck Equation , 1996 .

[5]  A. Onufriev,et al.  Speed of conformational change: comparing explicit and implicit solvent molecular dynamics simulations. , 2015, Biophysical journal.

[6]  F. Noé,et al.  Transition networks for modeling the kinetics of conformational change in macromolecules. , 2008, Current opinion in structural biology.

[7]  Hao Wu,et al.  Variational Approach for Learning Markov Processes from Time Series Data , 2017, Journal of Nonlinear Science.

[8]  Bin W. Zhang,et al.  Steady-state simulations using weighted ensemble path sampling. , 2009, The Journal of chemical physics.

[9]  Vijay S. Pande,et al.  OpenMM 7: Rapid development of high performance algorithms for molecular dynamics , 2016, bioRxiv.

[10]  Statistical Uncertainty Analysis for Small-Sample, High Log-Variance Data: Cautions for Bootstrapping and Bayesian Bootstrapping. , 2019, Journal of chemical theory and computation.

[11]  Frank Noé,et al.  Markov state models of biomolecular conformational dynamics. , 2014, Current opinion in structural biology.

[12]  Samuel D. Lotz,et al.  Predicting ligand binding affinity using on- and off-rates for the SAMPL6 SAMPLing challenge , 2018, Journal of Computer-Aided Molecular Design.

[13]  He Huang,et al.  Folding Simulations for Proteins with Diverse Topologies Are Accessible in Days with a Physics-Based Force Field and Implicit Solvent , 2014, Journal of the American Chemical Society.

[14]  Jeremy C. Smith,et al.  Hierarchical analysis of conformational dynamics in biomolecules: transition networks of metastable states. , 2007, The Journal of chemical physics.

[15]  D. Zuckerman,et al.  Transient probability currents provide upper and lower bounds on non-equilibrium steady-state currents in the Smoluchowski picture. , 2018, The Journal of chemical physics.

[16]  Frank Noé,et al.  Identification of kinetic order parameters for non-equilibrium dynamics. , 2018, The Journal of chemical physics.

[17]  Oliver F. Lange,et al.  Evaluation and optimization of discrete state models of protein folding. , 2012, The journal of physical chemistry. B.

[18]  Vijay S Pande,et al.  Improvements in Markov State Model Construction Reveal Many Non-Native Interactions in the Folding of NTL9. , 2013, Journal of chemical theory and computation.

[19]  F. Noé,et al.  Protein conformational plasticity and complex ligand-binding kinetics explored by atomistic simulations and Markov models , 2015, Nature Communications.

[20]  P. Alexander,et al.  Kinetic analysis of folding and unfolding the 56 amino acid IgG-binding domain of streptococcal protein G. , 1992, Biochemistry.

[21]  Joshua L Adelman,et al.  WESTPA: an interoperable, highly scalable software package for weighted ensemble simulation and analysis. , 2015, Journal of chemical theory and computation.

[22]  Exact rate calculations by trajectory parallelization and tilting. , 2009, The Journal of chemical physics.

[23]  Daniel M Zuckerman,et al.  Weighted Ensemble Simulation: Review of Methodology, Applications, and Software. , 2017, Annual review of biophysics.

[24]  Robert F. Murphy,et al.  Unbiased Rare Event Sampling in Spatial Stochastic Systems Biology Models Using a Weighted Ensemble of Trajectories , 2016, PLoS Comput. Biol..

[25]  Daniel M Zuckerman,et al.  Estimating first‐passage time distributions from weighted ensemble simulations and non‐Markovian analyses , 2016, Protein science : a publication of the Protein Society.

[26]  Soon-Ho Park,et al.  Folding dynamics of the B1 domain of protein G explored by ultrarapid mixing , 1999, Nature Structural Biology.

[27]  James R Faeder,et al.  Efficient stochastic simulation of chemical kinetics networks using a weighted ensemble of trajectories. , 2013, The Journal of chemical physics.

[28]  Soon-Ho Park,et al.  An early intermediate in the folding reaction of the B1 domain of protein G contains a native-like core. , 1997 .

[29]  Margaret J. Tse,et al.  Rare-event sampling of epigenetic landscapes and phenotype transitions , 2017, PLoS Comput. Biol..

[30]  Vijay S Pande,et al.  Using path sampling to build better Markovian state models: predicting the folding rate and mechanism of a tryptophan zipper beta hairpin. , 2004, The Journal of chemical physics.

[31]  Mohammad M. Sultan,et al.  Optimized parameter selection reveals trends in Markov state models for protein folding. , 2016, The Journal of chemical physics.

[32]  David Aristoff,et al.  Analysis and optimization of weighted ensemble sampling , 2016, ESAIM: Mathematical Modelling and Numerical Analysis.

[33]  Hao Wu,et al.  Variational selection of features for molecular kinetics. , 2018, The Journal of chemical physics.

[34]  Aaron R Dinner,et al.  Umbrella sampling for nonequilibrium processes. , 2007, The Journal of chemical physics.

[35]  J. Adelman,et al.  Simulating Current-Voltage Relationships for a Narrow Ion Channel Using the Weighted Ensemble Method. , 2015, Journal of chemical theory and computation.

[36]  L. Chong,et al.  Simultaneous Computation of Dynamical and Equilibrium Information Using a Weighted Ensemble of Trajectories , 2012, Journal of chemical theory and computation.

[37]  A. Dinner,et al.  Nonequilibrium umbrella sampling in spaces of many order parameters. , 2009, The Journal of chemical physics.

[38]  L. Chong,et al.  Efficient Atomistic Simulation of Pathways and Calculation of Rate Constants for a Protein-Peptide Binding Process: Application to the MDM2 Protein and an Intrinsically Disordered p53 Peptide. , 2016, The journal of physical chemistry letters.

[39]  Daniel M. Zuckerman,et al.  Accurate Estimation of Protein Folding and Unfolding Times: Beyond Markov State Models , 2016, Journal of chemical theory and computation.

[40]  David Chandler,et al.  Transition path sampling: throwing ropes over rough mountain passes, in the dark. , 2002, Annual review of physical chemistry.

[41]  Alex Dickson Mapping the Ligand Binding Landscape , 2018, Biophysical journal.

[42]  Vijay S Pande,et al.  Note: MSM lag time cannot be used for variational model selection. , 2017, The Journal of chemical physics.

[43]  E. Shakhnovich,et al.  The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Daniel M Zuckerman,et al.  The "weighted ensemble" path sampling method is statistically exact for a broad class of stochastic processes and binning procedures. , 2008, The Journal of chemical physics.

[45]  R. Dror,et al.  How Fast-Folding Proteins Fold , 2011, Science.

[46]  R. McGibbon,et al.  Variational cross-validation of slow dynamical modes in molecular kinetics. , 2014, The Journal of chemical physics.