Variational cross-validation of slow dynamical modes in molecular kinetics.

Markov state models are a widely used method for approximating the eigenspectrum of the molecular dynamics propagator, yielding insight into the long-timescale statistical kinetics and slow dynamical modes of biomolecular systems. However, the lack of a unified theoretical framework for choosing between alternative models has hampered progress, especially for non-experts applying these methods to novel biological systems. Here, we consider cross-validation with a new objective function for estimators of these slow dynamical modes, a generalized matrix Rayleigh quotient (GMRQ), which measures the ability of a rank-m projection operator to capture the slow subspace of the system. It is shown that a variational theorem bounds the GMRQ from above by the sum of the first m eigenvalues of the system's propagator, but that this bound can be violated when the requisite matrix elements are estimated subject to statistical uncertainty. This overfitting can be detected and avoided through cross-validation. These result make it possible to construct Markov state models for protein dynamics in a way that appropriately captures the tradeoff between systematic and statistical errors.

[1]  Robert T. McGibbon,et al.  MDTraj: a modern, open library for the analysis of molecular dynamics trajectories , 2014, bioRxiv.

[2]  Vijay S Pande,et al.  Statistical model selection for Markov models of biomolecular dynamics. , 2014, The journal of physical chemistry. B.

[3]  Joshua L Adelman,et al.  Structure-guided simulations illuminate the mechanism of ATP transport through VDAC1 , 2014, Nature Structural &Molecular Biology.

[4]  Frank Noé,et al.  Markov state models of biomolecular conformational dynamics. , 2014, Current opinion in structural biology.

[5]  Frank Noé,et al.  Variational Approach to Molecular Kinetics. , 2014, Journal of chemical theory and computation.

[6]  Stanley Osher,et al.  Density matrix minimization with $\ell_1$ regularization , 2014, 1403.1525.

[7]  V. Pande,et al.  Activation pathway of Src kinase reveals intermediate states as novel targets for drug design , 2014, Nature Communications.

[8]  R. Altman,et al.  Cloud-based simulations on Google Exacycle reveal ligand-modulation of GPCR activation pathways , 2013, Nature chemistry.

[9]  Alexander D. MacKerell,et al.  Force Field for Peptides and Proteins based on the Classical Drude Oscillator. , 2013, Journal of chemical theory and computation.

[10]  Benoît Roux,et al.  AUTOMATED FORCE FIELD PARAMETERIZATION FOR NON-POLARIZABLE AND POLARIZABLE ATOMIC MODELS BASED ON AB INITIO TARGET DATA. , 2013, Journal of chemical theory and computation.

[11]  Christian N. Cunningham,et al.  Conformational dynamics control ubiquitin-deubiquitinase interactions and influence in vivo signaling , 2013, Proceedings of the National Academy of Sciences.

[12]  Daniel Müllner,et al.  fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python , 2013 .

[13]  Vijay S Pande,et al.  Improvements in Markov State Model Construction Reveal Many Non-Native Interactions in the Folding of NTL9. , 2013, Journal of chemical theory and computation.

[14]  Toni Giorgino,et al.  Identification of slow molecular order parameters for Markov model construction. , 2013, The Journal of chemical physics.

[15]  Lee-Ping Wang,et al.  Systematic Parametrization of Polarizable Force Fields from Quantum Chemistry Data. , 2013, Journal of chemical theory and computation.

[16]  K. Schulten,et al.  An emerging consensus on voltage-dependent gating from computational modeling and molecular dynamics simulations , 2012, The Journal of general physiology.

[17]  Frank Noé,et al.  A Variational Approach to Modeling Slow Processes in Stochastic Dynamical Systems , 2012, Multiscale Model. Simul..

[18]  Frank Noé,et al.  Kinetic characterization of the critical step in HIV-1 protease maturation , 2012, Proceedings of the National Academy of Sciences.

[19]  Oliver F. Lange,et al.  Evaluation and optimization of discrete state models of protein folding. , 2012, The journal of physical chemistry. B.

[20]  Alexander D. MacKerell,et al.  Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles. , 2012, Journal of chemical theory and computation.

[21]  Amedeo Caflisch,et al.  Distribution of Reciprocal of Interatomic Distances: a Fast Structural Metric , 2022 .

[22]  Vijay S Pande,et al.  Simple few-state models reveal hidden complexity in protein folding , 2012, Proceedings of the National Academy of Sciences.

[23]  Frank Noé,et al.  EMMA: A Software Package for Markov Model Building and Analysis. , 2012, Journal of chemical theory and computation.

[24]  Gregory R Bowman,et al.  Improved coarse-graining of Markov state models via explicit consideration of statistical uncertainty. , 2012, The Journal of chemical physics.

[25]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[26]  Thomas J Lane,et al.  MSMBuilder2: Modeling Conformational Dynamics at the Picosecond to Millisecond Scale. , 2011, Journal of chemical theory and computation.

[27]  Gert R. G. Lanckriet,et al.  A majorization-minimization approach to the sparse generalized eigenvalue problem , 2011, Machine Learning.

[28]  G. de Fabritiis,et al.  Complete reconstruction of an enzyme-inhibitor binding process by molecular dynamics simulations , 2011, Proceedings of the National Academy of Sciences.

[29]  Frank Noé,et al.  Markov models of molecular kinetics: generation and validation. , 2011, The Journal of chemical physics.

[30]  M. Maggioni,et al.  Determination of reaction coordinates via locally scaled diffusion map. , 2011, The Journal of chemical physics.

[31]  Daniel‐Adriano Silva,et al.  Simulating the T-jump-triggered unfolding dynamics of trpzip2 peptide and its time-resolved IR and two-dimensional IR signals using the Markov state model approach. , 2011, The journal of physical chemistry. B.

[32]  Lars Kai Hansen,et al.  A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis , 2011, J. Mach. Learn. Res..

[33]  M. Cornec,et al.  Concentration inequalities of the cross-validation estimator for Empirical Risk Minimiser , 2010, 1011.0096.

[34]  Hans C Andersen,et al.  A Bayesian method for construction of Markov models to describe dynamics on various time-scales. , 2010, The Journal of chemical physics.

[35]  Klaus Schulten,et al.  GPU-accelerated molecular modeling coming of age. , 2010, Journal of molecular graphics & modelling.

[36]  Vijay S. Pande,et al.  Everything you wanted to know about Markov State Models but were afraid to ask. , 2010, Methods.

[37]  Frank Noé,et al.  On the Approximation Quality of Markov State Models , 2010, Multiscale Model. Simul..

[38]  David P. Anderson,et al.  High-Throughput All-Atom Molecular Dynamics Simulations Using Distributed Computing , 2010, J. Chem. Inf. Model..

[39]  Kyle A. Beauchamp,et al.  Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39). , 2010, Journal of the American Chemical Society.

[40]  Margaret E. Johnson,et al.  Current status of the AMOEBA polarizable force field. , 2010, The journal of physical chemistry. B.

[41]  Wilfred Pinfold,et al.  Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis , 2009, HiPC 2009.

[42]  Vijay S Pande,et al.  Progress and challenges in the automated construction of Markov state models for full protein systems. , 2009, The Journal of chemical physics.

[43]  John D Chodera,et al.  Bayesian comparison of Markov models of molecular dynamics with detailed balance constraint. , 2009, The Journal of chemical physics.

[44]  A Caflisch,et al.  Identification of the protein folding transition state from molecular dynamics trajectories. , 2009, The Journal of chemical physics.

[45]  Jack H Freed,et al.  Using Markov models to simulate electron spin resonance spectra from molecular dynamics trajectories. , 2008, The journal of physical chemistry. B.

[46]  F. Noé,et al.  Transition networks for modeling the kinetics of conformational change in macromolecules. , 2008, Current opinion in structural biology.

[47]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[48]  D. Steinberg,et al.  Technometrics , 2008 .

[49]  R. Hegger,et al.  Dihedral angle principal component analysis of molecular dynamics simulations. , 2007, The Journal of chemical physics.

[50]  K. Dill,et al.  Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[51]  M. Parrinello,et al.  Canonical sampling through velocity rescaling. , 2007, The Journal of chemical physics.

[52]  Lydia E Kavraki,et al.  Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction , 2006, Proc. Natl. Acad. Sci. USA.

[53]  W. E,et al.  Towards a Theory of Transition Paths , 2006 .

[54]  Lewis E. Kay,et al.  New Tools Provide New Insights in NMR Studies of Protein Dynamics , 2006, Science.

[55]  Vijay S Pande,et al.  Validation of Markov state models using Shannon's entropy. , 2006, The Journal of chemical physics.

[56]  刘金明,et al.  IL-13受体α2降低血吸虫病肉芽肿的炎症反应并延长宿主存活时间[英]/Mentink-Kane MM,Cheever AW,Thompson RW,et al//Proc Natl Acad Sci U S A , 2005 .

[57]  D. Theobald short communications Acta Crystallographica Section A Foundations of , 2005 .

[58]  P. Deuflhard,et al.  Robust Perron cluster analysis in conformation dynamics , 2005 .

[59]  M. Karplus,et al.  Hidden complexity of free energy surfaces for peptide (protein) folding. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[60]  I. Jolliffe Principal Component Analysis , 2005 .

[61]  E. Gross,et al.  Ensemble-Hartree-Fock scheme for excited states. The optimized effective potential method , 2002 .

[62]  J. Kuriyan,et al.  The Conformational Plasticity of Protein Kinases , 2002, Cell.

[63]  Vijay S. Pande,et al.  Screen Savers of the World Unite! , 2000, Science.

[64]  P. Absil,et al.  A Grassmann-Rayleigh quotient iteration for computing invariant subspaces , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[65]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[66]  C. Schütte Conformational Dynamics: Modelling, Theory, Algorithm, and Application to Biomolecules , 1999 .

[67]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[68]  T. Darden,et al.  Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems , 1993 .

[69]  K. Dill,et al.  The protein folding problem. , 1993, Annual review of biophysics.

[70]  Oliveira,et al.  Rayleigh-Ritz variational principle for ensembles of fractionally occupied states. , 1988, Physical review. A, General physics.

[71]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[72]  Jack D. Dunitz,et al.  From crystal statics to chemical dynamics , 1983 .

[73]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[74]  M. Parrinello,et al.  Polymorphic transitions in single crystals: A new molecular dynamics method , 1981 .

[75]  A K Theophilou,et al.  The energy density functional formalism for excited states , 1979 .

[76]  Charles A. Micchelli,et al.  Some Problems in the Approximation of Functions of Two Variables and n-Widths of Integral Operators , 1978 .

[77]  G Careri,et al.  Statistical time events in enzymes: a physical assessment. , 1975, CRC critical reviews in biochemistry.

[78]  R. Courant,et al.  Methods of Mathematical Physics , 1962 .

[79]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations: II. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[80]  E. Schmidt Zur Theorie der linearen und nichtlinearen Integralgleichungen , 1907 .

[81]  Kilian Q. Weinberger,et al.  Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 , 2016 .

[82]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[83]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[84]  Michael L. Overton,et al.  On the Sum of the Largest Eigenvalues of a Symmetric Matrix , 1992, SIAM J. Matrix Anal. Appl..

[85]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[86]  S. Larson The shrinkage of the coefficient of multiple correlation. , 1931 .