Uncovering Large-Scale Conformational Change in Molecular Dynamics without Prior Knowledge.

As the length of molecular dynamics (MD) trajectories grows with increasing computational power, so does the importance of clustering methods for partitioning trajectories into conformational bins. Of the methods available, the vast majority require users to either have some a priori knowledge about the system to be clustered or to tune clustering parameters through trial and error. Here we present non-parametric uses of two modern clustering techniques suitable for first-pass investigation of an MD trajectory. Being non-parametric, these methods require neither prior knowledge nor parameter tuning. The first method, HDBSCAN, is fast-relative to other popular clustering methods-and is able to group unstructured or intrinsically disordered systems (such as intrinsically disordered proteins, or IDPs) into bins that represent global conformational shifts. HDBSCAN is also useful for determining the overall stability of a system-as it tends to group stable systems into one or two bins-and identifying transition events between metastable states. The second method, iMWK-Means, with explicit rescaling followed by K-Means, while slower than HDBSCAN, performs well with stable, structured systems such as folded proteins and is able to identify higher resolution details such as changes in relative position of secondary structural elements. Used in conjunction, these clustering methods allow a user to discern quickly and without prior knowledge the stability of a simulated system and identify both local and global conformational changes.

[1]  C. Heidelberger,et al.  Fluorinated Pyrimidines, A New Class of Tumour-Inhibitory Compounds , 1957, Nature.

[2]  Renato Cordeiro de Amorim,et al.  A Survey on Feature Weighting Based K-Means Algorithms , 2015, Journal of Classification.

[3]  Fabio Stella,et al.  Conformational and functional analysis of molecular dynamics trajectories by Self-Organising Maps , 2011, BMC Bioinformatics.

[4]  Panos M. Pardalos,et al.  Clusters, Orders, and Trees: Methods and Applications In Honor of Boris Mirkin's 70th Birthday , 2014 .

[5]  Alexander H. Chung,et al.  Fe-S cluster biogenesis in Gram-positive bacteria: SufU is a zinc-dependent sulfur transfer protein. , 2014, Biochemistry.

[6]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[7]  Rodrigo C. Barros,et al.  Clustering Molecular Dynamics Trajectories for Optimizing Docking Experiments , 2015, Comput. Intell. Neurosci..

[8]  E. Di Cera,et al.  Release of fibrinopeptides by the slow and fast forms of thrombin. , 1996, Biochemistry.

[9]  T. Darden,et al.  Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems , 1993 .

[10]  Renato Cordeiro de Amorim,et al.  Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering , 2012, Pattern Recognit..

[11]  E. Di Cera,et al.  Molecular recognition by thrombin. Role of the slow-->fast transition, site-specific ion binding energetics and thermodynamic mapping of structural components. , 1994, Journal of molecular biology.

[12]  L. Pollack,et al.  Physics and Astronomy Faculty Publications Physics and Astronomy Inter-dna Attraction Mediated by Divalent Counterions , 2022 .

[13]  K Fujikawa,et al.  The coagulation cascade: initiation, maintenance, and regulation. , 1991, Biochemistry.

[14]  E. Di Cera,et al.  The Na+ Binding Site of Thrombin (*) , 1995, The Journal of Biological Chemistry.

[15]  David E. Shaw,et al.  The future of molecular dynamics simulations in drug discovery , 2011, Journal of Computer-Aided Molecular Design.

[16]  Christian Hennig,et al.  Recovering the number of clusters in data sets with noise features using feature rescaling factors , 2015, Inf. Sci..

[17]  Florian Sittel,et al.  Robust Density-Based Clustering To Identify Metastable Conformational States of Proteins. , 2016, Journal of chemical theory and computation.

[18]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[19]  F. Salsbury,et al.  The molecular mechanism of DNA damage recognition by MutS homologs and its consequences for cell death response , 2006, Nucleic acids research.

[20]  Alexander D. MacKerell,et al.  All‐atom empirical force field for nucleic acids: II. Application to molecular dynamics simulations of DNA and RNA in solution , 2000 .

[21]  M. Redinbo,et al.  Characterization of the N-acetyl-α-D-glucosaminyl l-malate synthase and deacetylase functions for bacillithiol biosynthesis in Bacillus anthracis . , 2010, Biochemistry.

[22]  Xuhui Huang,et al.  Quantitative comparison of alternative methods for coarse-graining biological networks. , 2013, The Journal of chemical physics.

[23]  C. Esmon The regulation of natural anticoagulant pathways , 1987, Science.

[24]  Alexander D. MacKerell,et al.  Development and current status of the CHARMM force field for nucleic acids , 2000, Biopolymers.

[25]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[26]  Ryan L. Melvin,et al.  All-Atom Molecular Dynamics Reveals Mechanism of Zinc Complexation with Therapeutic F10. , 2016, The journal of physical chemistry. B.

[27]  F. Noé Beating the millisecond barrier in molecular dynamics simulations. , 2015, Biophysical journal.

[28]  Rafael C. Bernardi,et al.  Molecular dynamics simulations of large macromolecular complexes. , 2015, Current opinion in structural biology.

[29]  Malika Charrad,et al.  NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set , 2014 .

[30]  Toni Giorgino,et al.  Identification of slow molecular order parameters for Markov model construction. , 2013, The Journal of chemical physics.

[31]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[32]  Heather A Carlson,et al.  Exploring experimental sources of multiple protein conformations in structure-based drug design. , 2007, Journal of the American Chemical Society.

[33]  P. Johnston,et al.  5-Fluorouracil: mechanisms of action and clinical strategies , 2003, Nature Reviews Cancer.

[34]  M. Delepierre,et al.  Solution structure of NEMO zinc finger and impact of an anhidrotic ectodermal dysplasia with immunodeficiency-related point mutation. , 2008, Journal of molecular biology.

[35]  Wei Li,et al.  Crystal structure of wild-type human thrombin in the Na+-free state. , 2005, The Biochemical journal.

[36]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[37]  E. Di Cera,et al.  Thrombin-fibrinogen interaction: pH dependence and effects of the slow-->fast transition. , 1993, Biochemistry.

[38]  R. Prim Shortest connection networks and some generalizations , 1957 .

[39]  W. Webb,et al.  Ionic strength-dependent persistence lengths of single-stranded RNA and DNA , 2011, Proceedings of the National Academy of Sciences.

[40]  C. Esmon,et al.  The molecular basis of thrombin allostery revealed by a 1.8 A structure of the "slow" form. , 2003, Structure.

[41]  W. Scott,et al.  Determining and visualizing flexibility in protein structures , 2015, Proteins.

[42]  L. Beese,et al.  Structure of the Human MutSα DNA Lesion Recognition Complex , 2007 .

[43]  P. D. Dos Santos,et al.  Kinetic analysis of the bisubstrate cysteine desulfurase SufS from Bacillus subtilis. , 2010, Biochemistry.

[44]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[45]  Gareth J. Janacek,et al.  Clustering Time Series with Clipped Data , 2005, Machine Learning.

[46]  Xiaotong Shen,et al.  Nonlinear joint latent variable models and integrative tumor subtype discovery , 2016, Stat. Anal. Data Min..

[47]  S. Coughlin,et al.  Thrombin signalling and protease-activated receptors , 2000, Nature.

[48]  F. Salsbury,et al.  Destabilization of the MutSα’s protein-protein interface due to binding to the DNA adduct induced by anticancer agent carboplatin via molecular dynamics simulations , 2013, Journal of Molecular Modeling.

[49]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Dan Wang,et al.  Anomaly detection based on probability density function with Kullback-Leibler divergence , 2016, Signal Process..

[51]  M J Harvey,et al.  ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale. , 2009, Journal of chemical theory and computation.

[52]  J. Šponer,et al.  Molecular dynamics simulations of G-DNA and perspectives on the simulation of nucleic acid structures. , 2012, Methods.

[53]  M J Harvey,et al.  An Implementation of the Smooth Particle Mesh Ewald Method on GPU Hardware. , 2009, Journal of chemical theory and computation.

[54]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[55]  G. Nicolaes,et al.  Expression of Allosteric Linkage between the Sodium Ion Binding Site and Exosite I of Thrombin during Prothrombin Activation* , 2007, Journal of Biological Chemistry.

[56]  P. D. Dos Santos,et al.  Functional Analysis of Bacillus subtilis Genes Involved in the Biosynthesis of 4-Thiouridine in tRNA , 2012, Journal of bacteriology.

[57]  W. Reinhold,et al.  Genome-Wide mRNA and microRNA Profiling of the NCI 60 Cell-Line Screen and Comparison of FdUMP[10] with Fluorouracil, Floxuridine, and Topoisomerase 1 Poisons , 2010, Molecular Cancer Therapeutics.

[58]  Li Junlin,et al.  Molecular dynamics-like data clustering approach , 2011 .

[59]  Wei Pan,et al.  Integrative and regularized principal component analysis of multiple sources of data , 2016, Statistics in medicine.

[60]  H. Berendsen,et al.  Molecular dynamics with coupling to an external bath , 1984 .

[61]  Andrew E. Torda,et al.  Algorithms for clustering molecular dynamics configurations , 1994, J. Comput. Chem..

[62]  J. Liu,et al.  Positive interaction between 5-FU and FdUMP[10] in the inhibition of human colorectal tumor cell proliferation. , 1999, Antisense & nucleic acid drug development.

[63]  Boris G. Mirkin,et al.  Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads , 2010, J. Classif..

[64]  K. Kurachi,et al.  The molecular-weight dependence of the rate-enhancing effect of heparin on the inhibition of thrombin, factor Xa, factor IXa, factor XIa, factor XIIa and kallikrein by antithrombin. , 1981, The Biochemical journal.

[65]  Thomas J Lane,et al.  MDTraj: a modern, open library for the analysis of molecular dynamics trajectories , 2014, bioRxiv.

[66]  G. Stock,et al.  Principal component analysis of molecular dynamics: on the use of Cartesian vs. internal coordinates. , 2014, The Journal of chemical physics.

[67]  P. D. Dos Santos,et al.  Protective role of bacillithiol in superoxide stress and Fe–S metabolism in Bacillus subtilis , 2015, MicrobiologyOpen.

[68]  F. Salsbury,et al.  The molecular origin of the MMR-dependent apoptosis pathway from dynamics analysis of MutSα-DNA complexes , 2012, Journal of biomolecular structure & dynamics.

[69]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[70]  E. Lengyel,et al.  Thrombin Induces Tumor Invasion through the Induction and Association of Matrix Metalloproteinase-9 and β1-Integrin on the Cell Surface* , 2008, Journal of Biological Chemistry.

[71]  Florence Cordier,et al.  The Zinc Finger of NEMO Is a Functional Ubiquitin-binding Domain* , 2009, Journal of Biological Chemistry.

[72]  Alexander D. MacKerell,et al.  All‐atom empirical force field for nucleic acids: I. Parameter optimization based on small molecule and condensed phase macromolecular target data , 2000 .

[73]  F. Salsbury,et al.  Small molecule induction of MSH2-dependent cell death suggests a vital role of mismatch repair proteins in cell death. , 2009, DNA repair.

[74]  Ying Wu,et al.  Review of Clustering Algorithms , 2009 .

[75]  P. D. Dos Santos,et al.  Protected sulfur transfer reactions by the Escherichia coli Suf system. , 2013, Biochemistry.

[76]  Arthur Zimek,et al.  Density-Based Clustering Validation , 2014, SDM.

[77]  Ryan L. Melvin,et al.  Visualizing ensembles in structural biology. , 2016, Journal of molecular graphics & modelling.

[78]  Stewart A. Adcock,et al.  Molecular dynamics: survey of methods for simulating the activity of proteins. , 2006, Chemical reviews.

[79]  Ettore Novellino,et al.  High-resolution structures of two complexes between thrombin and thrombin-binding aptamer shed light on the role of cations in the aptamer inhibitory activity , 2012, Nucleic acids research.

[80]  Pablo A. Jaskowiak On the evaluation of clustering results: measures, ensembles, and gene expression data analysis , 2015 .

[81]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[82]  Y. Pommier,et al.  A novel polypyrimidine antitumor agent FdUMP[10] induces thymineless death with topoisomerase I-DNA complexes. , 2005, Cancer research.

[83]  F. Salsbury,et al.  Insights into Protein—DNA Interactions, Stability and Allosteric Communications: A Computational Study of Mutsα-DNA Recognition Complexes , 2012, Journal of biomolecular structure & dynamics.

[84]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[85]  Fátima N. S. de Medeiros,et al.  Investigating Pill Recognition Methods for a New National Library of Medicine Image Dataset , 2015, ISVC.

[86]  Cynthia Rudin,et al.  The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List , 2009, J. Mach. Learn. Res..

[87]  J. Liu,et al.  Increased cytotoxicity and decreased in vivo toxicity of FdUMP[10] relative to 5-FU. , 1999, Nucleosides & nucleotides.

[88]  Wolfgang Schreiner,et al.  Spatiotemporal multistage consensus clustering in molecular dynamics studies of large proteins. , 2016, Molecular bioSystems.

[89]  Sunil V Sharma,et al.  Cross-functionalities of Bacillus deacetylases involved in bacillithiol biosynthesis and bacillithiol-S-conjugate detoxification pathways. , 2013, The Biochemical journal.

[90]  J. Jiricny The multifaceted mismatch-repair system , 2006, Nature Reviews Molecular Cell Biology.

[91]  H. Ploegh,et al.  Zinc-finger protein A20, a regulator of inflammation and cell survival, has de-ubiquitinating activity. , 2004, The Biochemical journal.

[92]  Peter L. Freddolino,et al.  Ten-microsecond molecular dynamics simulation of a fast-folding WW domain. , 2008, Biophysical journal.

[93]  W. Gmeiner,et al.  Cytotoxicity and in-vivo tolerance of FdUMP[10]: a novel pro-drug of the TS inhibitory nucleotide FdUMP. , 1999, Nucleosides & nucleotides.

[94]  The Stability of a Model Substrate for Topoisomerase 1-Mediated DNA Religation Depends on the Presence of Mismatched Base Pairs , 2011, Journal of nucleic acids.

[95]  J. Morser,et al.  TAFI, or Plasma Procarboxypeptidase B, Couples the Coagulation and Fibrinolytic Cascades through the Thrombin-Thrombomodulin Complex* , 1996, The Journal of Biological Chemistry.

[96]  Ryan P. Topping,et al.  Mismatch Repair Protein Deficiency Compromises Cisplatin-induced Apoptotic Signaling* , 2009, Journal of Biological Chemistry.

[97]  W. Gmeiner,et al.  Unique dual targeting of thymidylate synthase and topoisomerase1 by FdUMP[10] results in high efficacy against AML and low toxicity. , 2011, Blood.

[98]  Ezequiel López-Rubio,et al.  Learning Topologies with the Growing Neural Forest , 2016, Int. J. Neural Syst..

[99]  F. Salsbury,et al.  Non-specificity and synergy at the binding site of the carboplatin-induced DNA adduct via molecular dynamics simulations of the MutSα–DNA recognition complex , 2014, Journal of biomolecular structure & dynamics.

[100]  F. Salsbury,et al.  Mutations in the nucleotide-binding domain of MutS homologs uncouple cell death from cell survival. , 2004, DNA repair.

[101]  Babak Hassibi,et al.  The p-norm generalization of the LMS algorithm for adaptive filtering , 2003, IEEE Transactions on Signal Processing.

[102]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[103]  W. Gmeiner,et al.  Cooperative stabilization of Zn2+:DNA complexes through netropsin binding in the minor groove of FdU-substituted DNA , 2013, Journal of biomolecular structure & dynamics.

[104]  E. Di Cera,et al.  Kinetic Pathway for the Slow to Fast Transition of Thrombin , 1997, The Journal of Biological Chemistry.

[105]  Neil Davey,et al.  Unsupervised learning with normalised data and non-Euclidean norms , 2007, Appl. Soft Comput..

[106]  P. D. Dos Santos,et al.  Shared-intermediates in the biosynthesis of thio-cofactors: Mechanism and functions of cysteine desulfurases and sulfur acceptors. , 2015, Biochimica et biophysica acta.

[107]  Gerhard Stock,et al.  Identifying Metastable States of Folding Proteins. , 2012, Journal of chemical theory and computation.

[108]  Ryan C. Godwin,et al.  Importance of long-time simulations for rare event sampling in zinc finger proteins , 2016, Journal of biomolecular structure & dynamics.

[109]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[110]  Eric D. Scheeff,et al.  Molecular modeling of the intrastrand guanine-guanine DNA adducts produced by cisplatin and oxaliplatin. , 1999, Molecular pharmacology.

[111]  Sotaro Fuchigami,et al.  Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: the case of domain motions. , 2011, The Journal of chemical physics.

[112]  W. Gmeiner,et al.  Zn2+ selectively stabilizes FdU-substituted DNA through a unique major groove binding motif , 2011, Nucleic acids research.

[113]  Hans-Peter Kriegel,et al.  Density‐based clustering , 2011, WIREs Data Mining Knowl. Discov..

[114]  Y. Nemerson,et al.  An ordered addition, essential activation model of the tissue factor pathway of coagulation: evidence for a conformational cage. , 1986, Biochemistry.

[115]  Robert Huber,et al.  Structural basis for the anticoagulant activity of the thrombin–thrombomodulin complex , 2000, Nature.

[116]  L. Williams,et al.  DNA structure: cations in charge? , 1999, Current opinion in structural biology.

[117]  E. Vermaas,et al.  Selection of single-stranded DNA molecules that bind and inhibit human thrombin , 1992, Nature.

[118]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[119]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[120]  E. Di Cera,et al.  Thrombin is a Na(+)-activated enzyme. , 1992, Biochemistry.

[121]  M. Willingham,et al.  Efficacy and safety of FdUMP[10] in treatment of HT-29 human colon cancer xenografts. , 2002, International journal of oncology.

[122]  Freddie R Salsbury,et al.  Molecular dynamics simulations of protein dynamics and their relevance to drug discovery. , 2010, Current opinion in pharmacology.

[123]  Michel Verleysen,et al.  The Concentration of Fractional Distances , 2007, IEEE Transactions on Knowledge and Data Engineering.

[124]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[125]  P. D. Dos Santos,et al.  Abbreviated Pathway for Biosynthesis of 2-Thiouridine in Bacillus subtilis , 2015, Journal of bacteriology.

[126]  John E. Stone,et al.  An efficient library for parallel ray tracing and animation , 1998 .

[127]  J. Berg,et al.  Molecular dynamics simulations of biomolecules , 2002, Nature Structural Biology.

[128]  Sebastian Doniach,et al.  Understanding nucleic acid-ion interactions. , 2014, Annual review of biochemistry.

[129]  Jianyin Shao,et al.  Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms. , 2007, Journal of chemical theory and computation.

[130]  G. Peters,et al.  Mechanisms of action of FdUMP[10]: metabolite activation and thymidylate synthase inhibition. , 2007, Oncology reports.

[131]  S. Karpatkin,et al.  Thrombin induces tumor growth, metastasis, and angiogenesis: Evidence for a thrombin-regulated dormant tumor phenotype. , 2006, Cancer cell.

[132]  B. Brooks,et al.  Constant pressure molecular dynamics simulation: The Langevin piston method , 1995 .