Markov state models from short non-equilibrium simulations—Analysis and correction of estimation bias

Many state-of-the-art methods for the thermodynamic and kinetic characterization of large and complex biomolecular systems by simulation rely on ensemble approaches, where data from large numbers of relatively short trajectories are integrated. In this context, Markov state models (MSMs) are extremely popular because they can be used to compute stationary quantities and long-time kinetics from ensembles of short simulations, provided that these short simulations are in “local equilibrium” within the MSM states. However, over the last 15 years since the inception of MSMs, it has been controversially discussed and not yet been answered how deviations from local equilibrium can be detected, whether these deviations induce a practical bias in MSM estimation, and how to correct for them. In this paper, we address these issues: We systematically analyze the estimation of MSMs from short non-equilibrium simulations, and we provide an expression for the error between unbiased transition probabilities and the expected estimate from many short simulations. We show that the unbiased MSM estimate can be obtained even from relatively short non-equilibrium simulations in the limit of long lag times and good discretization. Further, we exploit observable operator model (OOM) theory to derive an unbiased estimator for the MSM transition matrix that corrects for the effect of starting out of equilibrium, even when short lag times are used. Finally, we show how the OOM framework can be used to estimate the exact eigenvalues or relaxation time scales of the system without estimating an MSM transition matrix, which allows us to practically assess the discretization quality of the MSM. Applications to model systems and molecular dynamics simulation data of alanine dipeptide are included for illustration. The improved MSM estimator is implemented in PyEMMA of version 2.3.

[1]  Kyle A. Beauchamp,et al.  Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39). , 2010, Journal of the American Chemical Society.

[2]  T. Darden,et al.  Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems , 1993 .

[3]  Marcus Weber A Subspace Approach to Molecular Markov State Models via an Infinitesimal Generator (revised version) , 2010 .

[4]  M J Harvey,et al.  ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale. , 2009, Journal of chemical theory and computation.

[5]  R. Dror,et al.  Improved side-chain torsion potentials for the Amber ff99SB protein force field , 2010, Proteins.

[6]  J. Preto,et al.  Fast recovery of free energy landscapes via diffusion-map-directed molecular dynamics. , 2014, Physical chemistry chemical physics : PCCP.

[7]  Hao Wu,et al.  Spectral Learning of Dynamic Systems from Nonequilibrium Data , 2016, NIPS.

[8]  V. Pande,et al.  Rapid equilibrium sampling initiated from nonequilibrium data , 2009, Proceedings of the National Academy of Sciences.

[9]  Frank Noé,et al.  Statistical inefficiency of Markov model count matrices , 2015 .

[10]  Xuhui Huang,et al.  Using generalized ensemble simulations and Markov state models to identify conformational states. , 2009, Methods.

[11]  G. Hummer,et al.  Coarse master equations for peptide folding dynamics. , 2008, The journal of physical chemistry. B.

[12]  F. Noé Probability distributions of molecular observables computed from Markov models. , 2008, The Journal of chemical physics.

[13]  F. Noé,et al.  Projected and hidden Markov models for calculating kinetics and metastable states of complex molecules. , 2013, The Journal of chemical physics.

[14]  K. Dill,et al.  Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[15]  S. Röblitz Statistical Error Estimation and Grid-free Hierarchical Refinement in Conformation Dynamics , 2009 .

[16]  Vijay S Pande,et al.  Progress and challenges in the automated construction of Markov state models for full protein systems. , 2009, The Journal of chemical physics.

[17]  Jan-Hendrik Prinz,et al.  Advanced estimation methods for Markov models of dynamical systems , 2013 .

[18]  C. Schütte,et al.  Supplementary Information for “ Constructing the Equilibrium Ensemble of Folding Pathways from Short Off-Equilibrium Simulations ” , 2009 .

[19]  Frank Noé,et al.  On the Approximation Quality of Markov State Models , 2010, Multiscale Model. Simul..

[20]  Marcus Weber,et al.  Meshless Methods in Conformation Dynamics , 2006 .

[21]  William Swope,et al.  Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 1. Theory , 2004 .

[22]  Christof Schütte,et al.  Metastability and Markov State Models in Molecular Dynamics Modeling, Analysis , 2016 .

[23]  Hao Wu,et al.  Projected metastable Markov processes and their estimation with observable operator models. , 2015, The Journal of chemical physics.

[24]  Frank Noé,et al.  HTMD: High-Throughput Molecular Dynamics for Molecular Discovery. , 2016, Journal of chemical theory and computation.

[25]  Frank Noé,et al.  Markov models of molecular kinetics: generation and validation. , 2011, The Journal of chemical physics.

[26]  R. Dror,et al.  How Fast-Folding Proteins Fold , 2011, Science.

[27]  S. Doerr,et al.  On-the-Fly Learning and Sampling of Ligand Binding by High-Throughput Molecular Simulations. , 2014, Journal of chemical theory and computation.

[28]  F. Noé,et al.  Dynamic properties of force fields. , 2015, The Journal of chemical physics.

[29]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[30]  G. Stewart Perturbation theory for the singular value decomposition , 1990 .

[31]  Joseph A. Bank,et al.  Supporting Online Material Materials and Methods Figs. S1 to S10 Table S1 References Movies S1 to S3 Atomic-level Characterization of the Structural Dynamics of Proteins , 2022 .

[32]  Frank Noé,et al.  A Variational Approach to Modeling Slow Processes in Stochastic Dynamical Systems , 2012, Multiscale Model. Simul..

[33]  Jeremy C. Smith,et al.  Hierarchical analysis of conformational dynamics in biomolecules: transition networks of metastable states. , 2007, The Journal of chemical physics.

[34]  Christof Schütte,et al.  Estimating the Eigenvalue Error of Markov State Models , 2012, Multiscale Model. Simul..

[35]  Hao Wu,et al.  Estimation and uncertainty of reversible Markov models. , 2015, The Journal of chemical physics.

[36]  Daniel‐Adriano Silva,et al.  Simulating the T-jump-triggered unfolding dynamics of trpzip2 peptide and its time-resolved IR and two-dimensional IR signals using the Markov state model approach. , 2011, The journal of physical chemistry. B.

[37]  Frank Noé,et al.  PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models. , 2015, Journal of chemical theory and computation.

[38]  P. Deuflhard,et al.  A Direct Approach to Conformational Dynamics Based on Hybrid Monte Carlo , 1999 .

[39]  Frank Noé,et al.  Variational Approach to Molecular Kinetics. , 2014, Journal of chemical theory and computation.