t-Distributed Stochastic Neighbor Embedding Method with the Least Information Loss for Macromolecular Simulations.

Dimensionality reduction methods are usually applied on molecular dynamics simulations of macromolecules for analysis and visualization purposes. It is normally desired that suitable dimensionality reduction methods could clearly distinguish functionally important states with different conformations for the systems of interest. However, common dimensionality reduction methods for macromolecules simulations, including predefined order parameters and collective variables (CVs), principal component analysis (PCA), and time-structure based independent component analysis (t-ICA), only have limited success due to significant key structural information loss. Here, we introduced the t-distributed stochastic neighbor embedding (t-SNE) method as a dimensionality reduction method with minimum structural information loss widely used in bioinformatics for analyses of macromolecules, especially biomacromolecules simulations. It is demonstrated that both one-dimensional (1D) and two-dimensional (2D) models of the t-SNE method are superior to distinguish important functional states of a model allosteric protein system for free energy and mechanistic analysis. Projections of the model protein simulations onto 1D and 2D t-SNE surfaces provide both clear visual cues and quantitative information, which is not readily available using other methods, regarding the transition mechanism between two important functional states of this protein.

[1]  Fabian J. Theis,et al.  Combined Single-Cell Functional and Gene Expression Analysis Resolves Heterogeneity within Stem Cell Populations , 2015, Cell stem cell.

[2]  Peng Tao,et al.  Directed kinetic transition network model. , 2019, The Journal of chemical physics.

[3]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[4]  F. Noé,et al.  Projected and hidden Markov models for calculating kinetics and metastable states of complex molecules. , 2013, The Journal of chemical physics.

[5]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[6]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[7]  Vijay S. Pande,et al.  Modeling Molecular Kinetics with tICA and the Kernel Trick , 2015, Journal of chemical theory and computation.

[8]  Sean C. Bendall,et al.  viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[9]  Hongyu Zhou,et al.  REDAN: relative entropy-based dynamical allosteric network model , 2018, Molecular physics.

[10]  Gianni De Fabritiis,et al.  Dimensionality reduction methods for molecular simulations , 2017, ArXiv.

[11]  Diwakar Shukla,et al.  OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation. , 2013, Journal of chemical theory and computation.

[12]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[13]  Alessandro Laio,et al.  Computing the Free Energy without Collective Variables. , 2018, Journal of chemical theory and computation.

[14]  Vijay S. Pande,et al.  Everything you wanted to know about Markov State Models but were afraid to ask. , 2010, Methods.

[15]  J. Keith Joung,et al.  Activation of prokaryotic transcription through arbitrary protein–protein contacts , 1997, Nature.

[16]  Sotaro Fuchigami,et al.  Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: the case of domain motions. , 2011, The Journal of chemical physics.

[17]  J. Berg,et al.  Molecular dynamics simulations of biomolecules , 2002, Nature Structural Biology.

[18]  Vijay S. Pande,et al.  OpenMM: A Hardware-Independent Framework for Molecular Simulations , 2010, Computing in Science & Engineering.

[19]  Ioannis G. Kevrekidis,et al.  Nonlinear dimensionality reduction in molecular simulation: The diffusion map approach , 2011 .

[20]  A. Oudenaarden,et al.  Design and Analysis of Single-Cell Sequencing Experiments , 2015, Cell.

[21]  Tosiyuki Noguti,et al.  Collective variable description of small-amplitude conformational fluctuations in a globular protein , 1982, Nature.

[22]  O. V. Galzitskaya,et al.  Radius of gyration as an indicator of protein structure compactness , 2008, Molecular Biology.

[23]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Jing Zhang,et al.  Unfolding Hidden Barriers by Active Enhanced Sampling , 2017, Physical review letters.

[25]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[26]  B. Zoltowski,et al.  Time-resolved dimerization of a PAS-LOV protein measured with photocoupled small angle X-ray scattering. , 2008, Journal of the American Chemical Society.

[27]  Peter G Wolynes,et al.  P versus Q: structural reaction coordinates capture protein folding on smooth landscapes. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[28]  B. Zoltowski,et al.  Revealing Hidden Conformational Space of LOV Protein VIVID Through Rigid Residue Scan Simulations , 2017, Scientific Reports.

[29]  Peter L. Freddolino,et al.  Signaling mechanisms of LOV domains: new insights from molecular dynamics studies , 2013, Photochemical & photobiological sciences : Official journal of the European Photochemistry Association and the European Society for Photobiology.

[30]  Gennady Verkhivker,et al.  Allosteric mechanism of the circadian protein Vivid resolved through Markov state model and machine learning analysis , 2019, PLoS Comput. Biol..

[31]  C. Brooks,et al.  Large-scale allosteric conformational transitions of adenylate kinase appear to involve a population-shift mechanism , 2007, Proceedings of the National Academy of Sciences.

[32]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[33]  A. Mitsutake,et al.  Relaxation mode analysis of a peptide system: comparison with principal component analysis. , 2011, The Journal of chemical physics.

[34]  Elena Papaleo,et al.  Free-energy landscape, principal component analysis, and structural clustering to identify representative conformations from molecular dynamics simulations: the myoglobin case. , 2009, Journal of molecular graphics & modelling.

[35]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[36]  B. Zoltowski,et al.  Mechanism-based tuning of a LOV domain photoreceptor. , 2009, Nature chemical biology.

[37]  T. Bhat,et al.  The Protein Data Bank and the challenge of structural genomics , 2000, Nature Structural Biology.

[38]  Frank Noé,et al.  Markov state models of biomolecular conformational dynamics. , 2014, Current opinion in structural biology.

[39]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[40]  James M. Joyce Kullback-Leibler Divergence , 2011, International Encyclopedia of Statistical Science.

[41]  Jian-Huang Lai,et al.  Linear Dimension Reduction Techniques , 2015, Encyclopedia of Biometrics.

[42]  T. Darden,et al.  A smooth particle mesh Ewald method , 1995 .

[43]  M. Karplus,et al.  Collective motions in proteins: A covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations , 1991, Proteins.

[44]  Fabian J. Theis,et al.  Diffusion maps for high-dimensional single-cell analysis of differentiation data , 2015, Bioinform..

[45]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[46]  Hongbin Zha,et al.  Riemannian Manifold Learning , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[48]  Li Han,et al.  Evaluation of Dimensionality-reduction Methods from Peptide Folding-unfolding Simulations. , 2013, Journal of chemical theory and computation.

[49]  W. Nowak,et al.  Ligand diffusion in proteins via enhanced sampling in molecular dynamics. , 2017, Physics of life reviews.

[50]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[51]  R. Nussinov,et al.  Allostery and population shift in drug discovery. , 2010, Current opinion in pharmacology.

[52]  Peter L. Freddolino,et al.  Ten-microsecond molecular dynamics simulation of a fast-folding WW domain. , 2008, Biophysical journal.

[53]  S. Pillai,et al.  The Perron-Frobenius theorem: some of its applications , 2005, IEEE Signal Processing Magazine.

[54]  Frank Noé,et al.  Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics , 2017, The Journal of chemical physics.

[55]  Michele Parrinello,et al.  Using sketch-map coordinates to analyze and bias molecular dynamics simulations , 2012, Proceedings of the National Academy of Sciences.

[56]  Vijay S Pande,et al.  Progress and challenges in the automated construction of Markov state models for full protein systems. , 2009, The Journal of chemical physics.

[57]  A. R. Srinivasan,et al.  Quasi‐harmonic method for studying very low frequency modes in proteins , 1984, Biopolymers.

[58]  Mohammad M. Sultan,et al.  MSMBuilder: Statistical Models for Biomolecular Dynamics , 2016, bioRxiv.

[59]  Brian D Zoltowski,et al.  Light activation of the LOV protein vivid generates a rapidly exchanging dimer. , 2008, Biochemistry.

[60]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[61]  N Go,et al.  Collective variable description of native protein dynamics. , 1995, Annual review of physical chemistry.