Linking time-series of single-molecule experiments with molecular dynamics simulations by machine learning

Single-molecule experiments and molecular dynamics (MD) simulations are indispensable tools for investigating protein conformational dynamics. The former provide time-series data, such as donor-acceptor distances, whereas the latter give atomistic information, although this information is often biased by model parameters. Here, we devise a machine-learning method to combine the complementary information from the two approaches and construct a consistent model of conformational dynamics. It is applied to the folding dynamics of the formin-binding protein WW domain. MD simulations over 400 μs led to an initial Markov state model (MSM), which was then "refined" using single-molecule Förster resonance energy transfer (FRET) data through hidden Markov modeling. The refined or data-assimilated MSM reproduces the FRET data and features hairpin one in the transition-state ensemble, consistent with mutation experiments. The folding pathway in the data-assimilated MSM suggests interplay between hydrophobic contacts and turn formation. Our method provides a general framework for investigating conformational transitions in other proteins.

[1]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[2]  Y. Sugita,et al.  Sequential data assimilation for single-molecule FRET photon-counting data. , 2015, The Journal of chemical physics.

[3]  Haw Yang,et al.  Expectation-maximization of the potential of mean force and diffusion coefficient in Langevin dynamics from single molecule FRET data photon by photon. , 2013, The journal of physical chemistry. B.

[4]  Helmut Grubmüller,et al.  Maximum likelihood trajectories from single molecule fluorescence resonance energy transfer experiments , 2003 .

[5]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[6]  Carlo Camilloni,et al.  Molecular dynamics simulations with replica-averaged structural restraints generate structural ensembles according to the maximum entropy principle. , 2013, The Journal of chemical physics.

[7]  Yuta Suzuki,et al.  Microsecond dynamics of an unfolded protein by a line confocal tracking of single molecule fluorescence , 2013, Scientific Reports.

[8]  Vijay S. Pande,et al.  Everything you wanted to know about Markov State Models but were afraid to ask. , 2010, Methods.

[9]  Yasuhiro Matsunaga,et al.  GENESIS 1.1: A hybrid‐parallel molecular dynamics simulator with enhanced sampling algorithms on multiple computational platforms , 2017, J. Comput. Chem..

[10]  Mohammad M. Sultan,et al.  MSMBuilder: Statistical Models for Biomolecular Dynamics , 2016, bioRxiv.

[11]  M. Karplus,et al.  Understanding beta-hairpin formation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[12]  T. Darden,et al.  Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems , 1993 .

[13]  Chun-Biu Li,et al.  Multiscale complex network of protein conformational fluctuations in single-molecule time series , 2008, Proceedings of the National Academy of Sciences.

[14]  Haw Yang,et al.  Extraction of Protein Conformational Modes from Distance Distributions Using Structurally Imputed Bayesian Data Augmentation. , 2016, The journal of physical chemistry. B.

[15]  Frank Noé,et al.  Markov models of molecular kinetics: generation and validation. , 2011, The Journal of chemical physics.

[16]  Yasushi Sako,et al.  Variational Bayes analysis of a photon-based hidden Markov model for single-molecule FRET trajectories. , 2012, Biophysical journal.

[17]  C. Schütte,et al.  Supplementary Information for “ Constructing the Equilibrium Ensemble of Folding Pathways from Short Off-Equilibrium Simulations ” , 2009 .

[18]  Michael T. Woodside,et al.  Protein folding trajectories can be described quantitatively by one-dimensional diffusion over measured energy landscapes , 2016, Nature Physics.

[19]  Max Löhning,et al.  Chung Folding Transition Path Times Single-Molecule Fluorescence Experiments Determine Protein , 2012 .

[20]  Michael W. Mahoney,et al.  Diffusion constant of the TIP5P model of liquid water , 2001 .

[21]  H. Grubmüller,et al.  Structural Heterogeneity and Quantitative FRET Efficiency Distributions of Polyprolines through a Hybrid Atomistic Simulation and Monte Carlo Approach , 2011, PloS one.

[22]  Toma E Tomov,et al.  Photon-by-Photon Hidden Markov Model Analysis for Microsecond Single-Molecule FRET Kinetics. , 2016, The journal of physical chemistry. B.

[23]  F. Noé,et al.  Projected and hidden Markov models for calculating kinetics and metastable states of complex molecules. , 2013, The Journal of chemical physics.

[24]  R. Best,et al.  Balanced Protein–Water Interactions Improve Properties of Disordered Proteins and Non-Specific Protein Association , 2014, Journal of chemical theory and computation.

[25]  A. Fersht,et al.  Protein Folding and Unfolding at Atomic Resolution , 2002, Cell.

[26]  Andreas W. Götz,et al.  SPFP: Speed without compromise - A mixed precision model for GPU accelerated molecular dynamics simulations , 2013, Comput. Phys. Commun..

[27]  V Muñoz,et al.  Folding dynamics and mechanism of beta-hairpin formation. , 1997, Nature.

[28]  R. Dror,et al.  How Fast-Folding Proteins Fold , 2011, Science.

[29]  Joseph F. Rudzinski,et al.  Communication: Consistent interpretation of molecular simulation kinetics using Markov state models biased with external information. , 2016, The Journal of chemical physics.

[30]  Vijay S Pande,et al.  Efficient maximum likelihood parameterization of continuous-time Markov processes. , 2015, The Journal of chemical physics.

[31]  Hiroshi Wako,et al.  Statistical Mechanical Theory of the Protein Conformation. II. Folding Pathway for Protein , 1978 .

[32]  Yasuhiro Matsunaga,et al.  GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations , 2015, Wiley interdisciplinary reviews. Computational molecular science.

[33]  Haw Yang,et al.  Information bounds and optimal analysis of dynamic single molecule measurements. , 2004, Biophysical journal.

[34]  Amedeo Caflisch,et al.  Free energy surfaces from single-distance information. , 2010, The journal of physical chemistry. B.

[35]  Stefano Piana,et al.  Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. , 2014, Current opinion in structural biology.

[36]  John D Chodera,et al.  On the Use of Experimental Observations to Bias Simulated Ensembles. , 2012, Journal of chemical theory and computation.

[37]  H. Berendsen,et al.  Molecular dynamics with coupling to an external bath , 1984 .

[38]  G. Ciccotti,et al.  Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes , 1977 .

[39]  Kevin J. McHale,et al.  Single-Molecule Fluorescence Experiments Determine Protein Folding Transition Path Times , 2012, Science.

[40]  A. Roitberg,et al.  Long-Time-Step Molecular Dynamics through Hydrogen Mass Repartitioning. , 2015, Journal of chemical theory and computation.

[41]  Gerhard Hummer,et al.  Native contacts determine protein folding mechanisms in atomistic simulations , 2013, Proceedings of the National Academy of Sciences.

[42]  K. Dill,et al.  The Protein-Folding Problem, 50 Years On , 2012, Science.

[43]  Benoît Roux,et al.  On the statistical equivalence of restrained-ensemble simulations with the maximum entropy method. , 2013, The Journal of chemical physics.

[44]  Vincent A. Voelz,et al.  Bridging Microscopic and Macroscopic Mechanisms of p53-MDM2 Binding with Kinetic Network Models. , 2017, Biophysical journal.

[45]  Oliver F. Lange,et al.  Evaluation and optimization of discrete state models of protein folding. , 2012, The journal of physical chemistry. B.

[46]  Kresten Lindorff-Larsen,et al.  Combining Experiments and Simulations Using the Maximum Entropy Principle , 2014, PLoS Comput. Biol..

[47]  Rhiju Das,et al.  Bayesian Energy Landscape Tilting: Towards Concordant Models of Molecular Ensembles , 2014, bioRxiv.

[48]  Yuguang Mu,et al.  Folding, misfolding, and amyloid protofibril formation of WW domain FBP28. , 2006, Biophysical journal.

[49]  K. Mardia,et al.  Inference of Structure Ensembles of Flexible Biomolecules from Sparse, Averaged Data , 2013, PloS one.

[50]  Amanda L. Jonsson,et al.  Φ-Analysis at the Experimental Limits: Mechanism of β-Hairpin Formation , 2006 .

[51]  A. Szabó,et al.  Decoding the pattern of photon colors in single-molecule FRET. , 2009, The journal of physical chemistry. B.

[52]  Chris H Wiggins,et al.  Learning rates and states from biophysical time series: a Bayesian approach to model selection and single-molecule FRET data. , 2009, Biophysical journal.

[53]  Vijay S Pande,et al.  Statistical model selection for Markov models of biomolecular dynamics. , 2014, The journal of physical chemistry. B.

[54]  John Karanicolas,et al.  The structural basis for biphasic kinetics in the folding of the WW domain from a formin-binding protein: Lessons for protein design? , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[55]  K. Dill,et al.  Inferring Transition Rates of Networks from Populations in Continuous-Time Markov Processes. , 2015, Journal of chemical theory and computation.

[56]  M. Macias,et al.  Structural analysis of WW domains and design of a WW prototype , 2000, Nature Structural Biology.

[57]  William A. Eaton,et al.  Structural origin of slow diffusion in protein folding , 2015, Science.

[58]  Paul Robustelli,et al.  Water dispersion interactions strongly influence simulated structural properties of disordered protein states. , 2015, The journal of physical chemistry. B.

[59]  Eric Vanden-Eijnden,et al.  Transition Path Theory for Markov Jump Processes , 2009, Multiscale Model. Simul..

[60]  Vijay S Pande,et al.  Improvements in Markov State Model Construction Reveal Many Non-Native Interactions in the Folding of NTL9. , 2013, Journal of chemical theory and computation.

[61]  Caitlin M. Davis,et al.  Parallel folding pathways of Fip35 WW domain explained by infrared spectra and their computer simulation , 2017, FEBS letters.

[62]  Hao Wu,et al.  Combining experimental and simulation data of molecular processes via augmented Markov models , 2017, Proceedings of the National Academy of Sciences.

[63]  Shimon Weiss,et al.  Site‐specific labeling of proteins for single‐molecule FRET by combining chemical and enzymatic modification , 2006, Protein science : a publication of the Protein Society.

[64]  A. Grishaev,et al.  Probing the Action of Chemical Denaturant on an Intrinsically Disordered Protein by Simulation and Experiment. , 2016, Journal of the American Chemical Society.

[65]  Hongbin Wan,et al.  A Maximum-Caliber Approach to Predicting Perturbed Folding Kinetics Due to Mutations. , 2016, Journal of chemical theory and computation.

[66]  Ken A. Dill,et al.  Inferring Microscopic Kinetic Rates from Stationary State Distributions , 2014, Journal of chemical theory and computation.

[67]  R. Best,et al.  Quantitative interpretation of FRET experiments via molecular simulation: force field and validation. , 2015, Biophysical journal.

[68]  Xuhui Huang,et al.  Conformational Dynamics of apo-GlnBP Revealed by Experimental and Computational Analysis. , 2016, Angewandte Chemie.

[69]  Vijay S Pande,et al.  The Fip35 WW domain folds with structural and mechanistic heterogeneity in molecular dynamics simulations. , 2009, Biophysical journal.

[70]  Dahlia R. Weiss,et al.  Millisecond dynamics of RNA polymerase II translocation at atomic resolution , 2014, Proceedings of the National Academy of Sciences.

[71]  H. Grubmüller,et al.  AMBER-DYES: Characterization of Charge Fluctuations and Force Field Parameterization of Fluorescent Dyes for Molecular Dynamics Simulations. , 2014, Journal of chemical theory and computation.

[72]  William A. Eaton,et al.  Single molecule fluorescence probes dynamics of barrier crossing , 2013, Nature.

[73]  V. Pande,et al.  Absolute comparison of simulated and experimental protein-folding dynamics , 2002, Nature.

[74]  V. Hornak,et al.  Comparison of multiple Amber force fields and development of improved protein backbone parameters , 2006, Proteins.

[75]  Frank Noé,et al.  EMMA: A Software Package for Markov Model Building and Analysis. , 2012, Journal of chemical theory and computation.

[76]  D. Thirumalai,et al.  Collapse transition in proteins. , 2009, Physical chemistry chemical physics : PCCP.

[77]  S. McKinney,et al.  Analysis of single-molecule FRET trajectories using hidden Markov modeling. , 2006, Biophysical journal.

[78]  Vijay S. Pande,et al.  Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models , 2014, ICML.

[79]  A. Szabó,et al.  Theory of Single‐Molecule FRET Efficiency Histograms , 2011 .

[80]  Ken A Dill,et al.  Caliber Corrected Markov Modeling (C2M2): Correcting Equilibrium Markov Models. , 2017, Journal of chemical theory and computation.

[81]  Kresten Lindorff-Larsen,et al.  Probabilistic Determination of Native State Ensembles of Proteins. , 2014, Journal of chemical theory and computation.

[82]  Jason C. Crane,et al.  The folding mechanism of a -sheet: the WW domain1 , 2001 .

[83]  V. Muñoz,et al.  Folding dynamics and mechanism of β-hairpin formation , 1997, Nature.

[84]  Jeremy C. Smith,et al.  Dynamical fingerprints for probing individual relaxation processes in biomolecular dynamics with simulations and kinetic experiments , 2011, Proceedings of the National Academy of Sciences.

[85]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[86]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[87]  A. Liwo,et al.  Folding kinetics of WW domains with the united residue force field for bridging microscopic motions and experimental measurements , 2014, Proceedings of the National Academy of Sciences.

[88]  Gavriel Salomon,et al.  T RANSFER OF LEARNING , 1992 .

[89]  K. Lindorff-Larsen,et al.  How robust are protein folding simulations with respect to force field parameterization? , 2011, Biophysical journal.

[90]  M. Gruebele,et al.  The folding mechanism of a beta-sheet: the WW domain. , 2001, Journal of molecular biology.

[91]  Everett A. Lipman,et al.  Förster transfer outside the weak-excitation limit , 2009 .

[92]  Duncan Poole,et al.  Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald. , 2013, Journal of chemical theory and computation.

[93]  P. Kollman,et al.  Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models , 1992 .

[94]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[95]  Peter L. Freddolino,et al.  Ten-microsecond molecular dynamics simulation of a fast-folding WW domain. , 2008, Biophysical journal.

[96]  Phi-analysis at the experimental limits: mechanism of beta-hairpin formation. , 2006, Journal of molecular biology.

[97]  Massimiliano Bonomi,et al.  Metainference: A Bayesian inference method for heterogeneous systems , 2015, Science Advances.