Statistical model selection for Markov models of biomolecular dynamics.

Markov state models provide a powerful framework for the analysis of biomolecular conformation dynamics in terms of their metastable states and transition rates. These models provide both a quantitative and comprehensible description of the long-time scale dynamics of large molecular dynamics with a Master equation and have been successfully used to study protein folding, protein conformational change, and protein-ligand binding. However, to achieve satisfactory performance, existing methodologies often require expert intervention when defining the model's discrete state space. While standard model selection methodologies focus on the minimization of systematic bias and disregard statistical error, we show that by consideration of the states' conditional distribution over conformations, both sources of error can be balanced evenhandedly. Application of techniques that consider both systematic bias and statistical error on two 100 μs molecular dynamics trajectories of the Fip35 WW domain shows agreement with existing techniques based on self-consistency of the model's relaxation time scales with more suitable results in regimes in which those time scale-based techniques encourage overfitting. By removing the need for expert tuning, these methods should reduce modeling bias and lower the barriers to entry in Markov state model construction.

[1]  Vijay S Pande,et al.  Progress and challenges in the automated construction of Markov state models for full protein systems. , 2009, The Journal of chemical physics.

[2]  Berk Hess,et al.  P-LINCS:  A Parallel Linear Constraint Solver for Molecular Simulation. , 2008, Journal of chemical theory and computation.

[3]  Frank Noé,et al.  Markov models of molecular kinetics: generation and validation. , 2011, The Journal of chemical physics.

[4]  Klaus Schulten,et al.  Challenges in protein-folding simulations , 2010 .

[5]  Diwakar Shukla,et al.  OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation. , 2013, Journal of chemical theory and computation.

[6]  R. Dror,et al.  Improved side-chain torsion potentials for the Amber ff99SB protein force field , 2010, Proteins.

[7]  Vijay S Pande,et al.  Enhanced modeling via network theory: Adaptive sampling of Markov state models. , 2010, Journal of chemical theory and computation.

[8]  Frank Noé,et al.  A Variational Approach to Modeling Slow Processes in Stochastic Dynamical Systems , 2012, Multiscale Model. Simul..

[9]  J. Kuha AIC and BIC , 2004 .

[10]  Martin Gruebele,et al.  Real-time detection of protein-water dynamics upon protein folding by terahertz absorption spectroscopy. , 2008, Angewandte Chemie.

[11]  D. Svergun,et al.  Structural characterization of proteins and complexes using small-angle X-ray solution scattering. , 2010, Journal of structural biology.

[12]  S. L. Mayo,et al.  Direct visualization reveals dynamics of a transient intermediate during protein assembly , 2011, Proceedings of the National Academy of Sciences.

[13]  Peter Deuflhard,et al.  Transfer Operator Approach to Conformational Dynamics in Biomolecular Systems , 2001 .

[14]  Toni Giorgino,et al.  Identification of slow molecular order parameters for Markov model construction. , 2013, The Journal of chemical physics.

[15]  Duncan Poole,et al.  Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born , 2012, Journal of chemical theory and computation.

[16]  K. Dill,et al.  Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[17]  Babis Kalodimos Protein dynamics and allostery , 2006 .

[18]  Kyle A. Beauchamp,et al.  Markov state model reveals folding and functional dynamics in ultra-long MD trajectories. , 2011, Journal of the American Chemical Society.

[19]  Christy F Landes,et al.  Evidence for non-two-state kinetics in the nucleocapsid protein chaperoned opening of DNA hairpins. , 2006, The journal of physical chemistry. B.

[20]  Amelia A. Fuller,et al.  An experimental survey of the transition between two-state and downhill protein folding scenarios , 2008, Proceedings of the National Academy of Sciences.

[21]  W. E,et al.  Towards a Theory of Transition Paths , 2006 .

[22]  Everett A Lipman,et al.  Single-Molecule Measurement of Protein Folding Kinetics , 2003, Science.

[23]  Vijay S Pande,et al.  Simple few-state models reveal hidden complexity in protein folding , 2012, Proceedings of the National Academy of Sciences.

[24]  Xuhui Huang,et al.  Quantitative comparison of alternative methods for coarse-graining biological networks. , 2013, The Journal of chemical physics.

[25]  Frank Noé,et al.  EMMA: A Software Package for Markov Model Building and Analysis. , 2012, Journal of chemical theory and computation.

[26]  H Frauenfelder,et al.  Dynamics of ligand binding to myoglobin. , 1975, Biochemistry.

[27]  Thomas J Lane,et al.  MSMBuilder2: Modeling Conformational Dynamics at the Picosecond to Millisecond Scale. , 2011, Journal of chemical theory and computation.

[28]  Diwakar Shukla,et al.  To milliseconds and beyond: challenges in the simulation of protein folding. , 2013, Current opinion in structural biology.

[29]  Joseph A. Bank,et al.  Supporting Online Material Materials and Methods Figs. S1 to S10 Table S1 References Movies S1 to S3 Atomic-level Characterization of the Structural Dynamics of Proteins , 2022 .

[30]  C. Chennubhotla,et al.  Intrinsic dynamics of enzymes in the unbound state and relation to allosteric regulation. , 2007, Current opinion in structural biology.

[31]  A. Gelfand,et al.  Bayesian Model Choice: Asymptotics and Exact Calculations , 1994 .

[32]  A. Liddle,et al.  Information criteria for astrophysical model selection , 2007, astro-ph/0701113.

[33]  R L Somorjai,et al.  Fuzzy cluster analysis of molecular dynamics trajectories , 1992, Proteins.

[34]  Oliver F. Lange,et al.  Evaluation and optimization of discrete state models of protein folding. , 2012, The journal of physical chemistry. B.

[35]  C. Dobson Protein folding and misfolding , 2003, Nature.

[36]  Charalampos G. Kalodimos,et al.  Protein dynamics and allostery: an NMR view. , 2011, Current opinion in structural biology.

[37]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[38]  Vijay S Pande,et al.  Improvements in Markov State Model Construction Reveal Many Non-Native Interactions in the Folding of NTL9. , 2013, Journal of chemical theory and computation.

[39]  Santosh S. Vempala,et al.  Simulated annealing in convex bodies and an O*(n4) volume algorithm , 2006, J. Comput. Syst. Sci..

[40]  H. Akaike A new look at the statistical model identification , 1974 .

[41]  Gerhard Hummer,et al.  Coordinate-dependent diffusion in protein folding , 2009, Proceedings of the National Academy of Sciences.

[42]  Vijay S. Pande,et al.  Screen Savers of the World Unite! , 2000, Science.

[43]  M. Simonovits,et al.  Random walks and an O * ( n 5 ) volume algorithm for convex bodies , 1997 .

[44]  David P. Anderson,et al.  High-Throughput All-Atom Molecular Dynamics Simulations Using Distributed Computing , 2010, J. Chem. Inf. Model..

[45]  G. Bowman,et al.  Equilibrium fluctuations of a single folded protein reveal a multitude of potential cryptic allosteric sites , 2012, Proceedings of the National Academy of Sciences.

[46]  Kathryn A. Dowsland,et al.  Simulated Annealing , 1989, Encyclopedia of GIS.

[47]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[48]  Miklós Simonovits,et al.  How to compute the volume in high dimension? , 2003, Math. Program..

[49]  Jean-Claude Latombe,et al.  Markov dynamic models for long-timescale protein motion , 2010, Bioinform..

[50]  P. Deuflhard,et al.  Identification of almost invariant aggregates in reversible nearly uncoupled Markov chains , 2000 .

[51]  K. Müller,et al.  Location of saddle points and minimum energy paths by a constrained simplex optimization procedure , 1979 .

[52]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[53]  William Swope,et al.  Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 2. Example Applications to Alanine Dipeptide and a β-Hairpin Peptide† , 2004 .