Reconstruction of protein structures from single-molecule time series.

Single-molecule experimental techniques track the real-time dynamics of molecules by recording a small number of experimental observables. Following these observables provides a coarse-grained, low-dimensional representation of the conformational dynamics but does not furnish an atomistic representation of the instantaneous molecular structure. Takens's delay embedding theorem asserts that, under quite general conditions, these low-dimensional time series can contain sufficient information to reconstruct the full molecular configuration of the system up to an a priori unknown transformation. By combining Takens's theorem with tools from statistical thermodynamics, manifold learning, artificial neural networks, and rigid graph theory, we establish an approach, Single-molecule TAkens Reconstruction, to learn this transformation and reconstruct molecular configurations from time series in experimentally measurable observables such as intramolecular distances accessible to single molecule Förster resonance energy transfer. We demonstrate the approach in applications to molecular dynamics simulations of a C24H50 polymer chain and the artificial mini-protein chignolin. The trained models reconstruct molecular configurations from synthetic time series data in the head-to-tail molecular distances with atomistic root mean squared deviation accuracies better than 0.2 nm. This work demonstrates that it is possible to accurately reconstruct protein structures from time series in experimentally measurable observables and establishes the theoretical and algorithmic foundations to do so in applications to real experimental data.

[1]  Jing Wang,et al.  MLLE: Modified Locally Linear Embedding Using Multiple Weights , 2006, NIPS.

[2]  I. Kevrekidis,et al.  Coarse-graining the dynamics of a driven interface in the presence of mobile impurities: effective description via diffusion maps. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  M.H. Hassoun,et al.  Fundamentals of Artificial Neural Networks , 1996, Proceedings of the IEEE.

[4]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Garegin A Papoian,et al.  Deconstructing the native state: energy landscapes, function, and dynamics of globular proteins. , 2009, The journal of physical chemistry. B.

[6]  A. Mees,et al.  Dynamics from multivariate time series , 1998 .

[7]  S. Nosé A unified formulation of the constant temperature molecular dynamics methods , 1984 .

[8]  H. Berendsen,et al.  Interaction Models for Water in Relation to Protein Hydration , 1981 .

[9]  Shinya Honda,et al.  10 residue folded peptide designed by segment statistics. , 2004, Structure.

[10]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[11]  T. Darden,et al.  A smooth particle mesh Ewald method , 1995 .

[12]  Berend Smit,et al.  Understanding molecular simulation: from algorithms to applications , 1996 .

[13]  M. Parrinello,et al.  Polymorphic transitions in single crystals: A new molecular dynamics method , 1981 .

[14]  Asok Ray,et al.  Principles of Riemannian Geometry in Neural Networks , 2017, NIPS.

[15]  Jiang Wang,et al.  Nonlinear machine learning in simulations of soft and biological materials , 2018 .

[16]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[17]  I. G. Kevrekidis,et al.  Coarse-grained dynamics of an activity bump in a neural field model , 2007 .

[18]  Andrew L. Ferguson,et al.  Landmark diffusion maps (L-dMaps): Accelerated manifold learning out-of-sample extension , 2017, Applied and Computational Harmonic Analysis.

[19]  Andrew L. Ferguson,et al.  Machine learning and data science in soft materials engineering , 2018, Journal of physics. Condensed matter : an Institute of Physics journal.

[20]  R. Best,et al.  Modest influence of FRET chromophores on the properties of unfolded proteins. , 2014, Biophysical journal.

[21]  D. Shukla,et al.  Maximizing Kinetic Information Gain of Markov State Models for Optimal Design of Spectroscopy Experiments. , 2018, The journal of physical chemistry. B.

[22]  R. Gilmore,et al.  Differential embedding of the Lorenz attractor. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[24]  G. P. King,et al.  Extracting qualitative dynamics from experimental data , 1986 .

[25]  Joseph A. Bank,et al.  Supporting Online Material Materials and Methods Figs. S1 to S10 Table S1 References Movies S1 to S3 Atomic-level Characterization of the Structural Dynamics of Proteins , 2022 .

[26]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[27]  T. Alber,et al.  Dynamic active-site protection by the M. tuberculosis protein tyrosine phosphatase PtpB lid domain. , 2010, Journal of the American Chemical Society.

[28]  Gerhard Stock,et al.  How complex is the dynamics of Peptide folding? , 2007, Physical review letters.

[29]  Martin Vetterli,et al.  Euclidean Distance Matrices: Essential theory, algorithms, and applications , 2015, IEEE Signal Processing Magazine.

[30]  Vialar. Thierry,et al.  Complex and Chaotic Nonlinear Dynamics , 2009 .

[31]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[32]  David S. Broomhead,et al.  Delay Embeddings for Forced Systems. II. Stochastic Forcing , 2003, J. Nonlinear Sci..

[33]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[34]  Rahul Roy,et al.  A practical guide to single-molecule FRET , 2008, Nature Methods.

[35]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[36]  A. W. Schüttelkopf,et al.  PRODRG: a tool for high-throughput crystallography of protein-ligand complexes. , 2004, Acta crystallographica. Section D, Biological crystallography.

[37]  Andrew L. Ferguson,et al.  Systematic determination of order parameters for chain dynamics using diffusion maps , 2010, Proceedings of the National Academy of Sciences.

[38]  R. Dror,et al.  Gaussian split Ewald: A fast Ewald mesh method for molecular simulation. , 2005, The Journal of chemical physics.

[39]  Zhen Yang,et al.  A Version of Isomap with Explicit Mapping , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[40]  Hoover,et al.  Canonical dynamics: Equilibrium phase-space distributions. , 1985, Physical review. A, General physics.

[41]  Gordon M. Crippen,et al.  Note rapid calculation of coordinates from distance matrices , 1978 .

[42]  James P. Crutchfield,et al.  Geometry from a Time Series , 1980 .

[43]  Amit Singer,et al.  A remark on global positioning from local distances , 2008, Proceedings of the National Academy of Sciences.

[44]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[45]  H. Berendsen,et al.  Essential dynamics of proteins , 1993, Proteins.

[46]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[47]  Yifan Cheng Single-Particle Cryo-EM at Crystallographic Resolution , 2015, Cell.

[48]  Andrew L. Ferguson,et al.  Machine learning for collective variable discovery and enhanced sampling in biomolecular simulation , 2020, Molecular Physics.

[49]  E. Weinan,et al.  Deep Potential: a general representation of a many-body potential energy surface , 2017, 1707.01478.

[50]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Ronald R. Coifman,et al.  Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators , 2005, NIPS.

[52]  F. Pietrucci Novel Enhanced Sampling Strategies for Transitions Between Ordered and Disordered Structures , 2020, Handbook of Materials Modeling.

[53]  Lydia E Kavraki,et al.  Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction , 2006, Proc. Natl. Acad. Sci. USA.

[54]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[55]  J. Ilja Siepmann,et al.  Transferable Potentials for Phase Equilibria. 1. United-Atom Description of n-Alkanes , 1998 .

[56]  Ronald R. Coifman,et al.  Graph Laplacian Tomography From Unknown Random Projections , 2008, IEEE Transactions on Image Processing.

[57]  J. Kowal,et al.  Retrieving high-resolution information from disordered 2D crystals by single-particle cryo-EM. , 2019 .

[58]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[59]  Andrew L. Ferguson,et al.  Nonlinear reconstruction of single-molecule free-energy surfaces from univariate time series. , 2016, Physical review. E.

[60]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[61]  Luis A. Aguirre,et al.  On the non-equivalence of observables in phase-space reconstructions from recorded time series , 1998 .

[62]  Gerrit Groenhof,et al.  GROMACS: Fast, flexible, and free , 2005, J. Comput. Chem..

[63]  L. Cao Practical method for determining the minimum embedding dimension of a scalar time series , 1997 .

[64]  J. Stark,et al.  Delay Embeddings for Forced Systems. I. Deterministic Forcing , 1999 .

[65]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[66]  Andrew L. Ferguson,et al.  Recovery of Protein Folding Funnels from Single-Molecule Time Series by Delay Embeddings and Manifold Learning. , 2018, The journal of physical chemistry. B.

[67]  Ioannis G. Kevrekidis,et al.  Nonlinear dimensionality reduction in molecular simulation: The diffusion map approach , 2011 .

[68]  Jianzhong Wang,et al.  Geometric Structure of High-Dimensional Data and Dimensionality Reduction , 2012 .

[69]  F. Takens Detecting strange attractors in turbulence , 1981 .

[70]  García,et al.  Large-amplitude nonlinear motions in proteins. , 1992, Physical review letters.

[71]  R. Dror,et al.  How Fast-Folding Proteins Fold , 2011, Science.

[72]  K. Lindorff-Larsen,et al.  How robust are protein folding simulations with respect to force field parameterization? , 2011, Biophysical journal.

[73]  George Sugihara,et al.  Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling , 2015, Proceedings of the National Academy of Sciences.

[74]  H. Abarbanel,et al.  Determining embedding dimension for phase-space reconstruction using a geometrical construction. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[75]  S. Antonyuk,et al.  Sub-atomic resolution X-ray crystallography and neutron crystallography: promise, challenges and potential , 2015, IUCrJ.

[76]  Ronald R. Coifman,et al.  Diffusion Maps, Reduction Coordinates, and Low Dimensional Representation of Stochastic Systems , 2008, Multiscale Model. Simul..

[77]  S. Rosenthal,et al.  Real-time quantum dot tracking of single proteins. , 2011, Methods in molecular biology.

[78]  Federico D. Sacerdoti,et al.  Scalable Algorithms for Molecular Dynamics Simulations on Commodity Clusters , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[79]  Gérard Gouesbet,et al.  Topological Characterization of Reconstructed Attractors Modding Out Symmetries , 1996 .