Principal component and clustering analysis on molecular dynamics data of the ribosomal L11·23S subdomain

With improvements in computer speed and algorithm efficiency, MD simulations are sampling larger amounts of molecular and biomolecular conformations. Being able to qualitatively and quantitatively sift these conformations into meaningful groups is a difficult and important task, especially when considering the structure-activity paradigm. Here we present a study that combines two popular techniques, principal component (PC) analysis and clustering, for revealing major conformational changes that occur in molecular dynamics (MD) simulations. Specifically, we explored how clustering different PC subspaces effects the resulting clusters versus clustering the complete trajectory data. As a case example, we used the trajectory data from an explicitly solvated simulation of a bacteria’s L11·23S ribosomal subdomain, which is a target of thiopeptide antibiotics. Clustering was performed, using K-means and average-linkage algorithms, on data involving the first two to the first five PC subspace dimensions. For the average-linkage algorithm we found that data-point membership, cluster shape, and cluster size depended on the selected PC subspace data. In contrast, K-means provided very consistent results regardless of the selected subspace. Since we present results on a single model system, generalization concerning the clustering of different PC subspaces of other molecular systems is currently premature. However, our hope is that this study illustrates a) the complexities in selecting the appropriate clustering algorithm, b) the complexities in interpreting and validating their results, and c) by combining PC analysis with subsequent clustering valuable dynamic and conformational information can be obtained.

[1]  Peter S. Shenkin,et al.  Cluster analysis of molecular conformations , 1994, J. Comput. Chem..

[2]  H. Schwalbe,et al.  L11 domain rearrangement upon binding to RNA and thiostrepton studied by NMR spectroscopy , 2006, Nucleic acids research.

[3]  E. J. Murgola,et al.  Interaction of Thiostrepton and Elongation Factor-G with the Ribosomal Protein L11-binding Domain* , 2005, Journal of Biological Chemistry.

[4]  I. Jolliffe Principal Component Analysis , 2002 .

[5]  K. Shin‐ya,et al.  Molecular determinants of microbial resistance to thiopeptide antibiotics. , 2010, Journal of the American Chemical Society.

[6]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[7]  B. Hess,et al.  Similarities between principal components of protein dynamics and random diffusion , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[8]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[9]  J. McCutcheon,et al.  A Detailed View of a Ribosomal Active Site The Structure of the L11–RNA Complex , 1999, Cell.

[10]  H. Berendsen,et al.  Essential dynamics of proteins , 1993, Proteins.

[11]  T. Darden,et al.  A smooth particle mesh Ewald method , 1995 .

[12]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[13]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[14]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[15]  J. Mccammon,et al.  Multivariate analysis of conserved sequence-structure relationships in kinesins: coupling of the active site and a tubulin-binding sub-domain. , 2007, Journal of molecular biology.

[16]  Jeremy C. Smith,et al.  Hierarchical analysis of conformational dynamics in biomolecules: transition networks of metastable states. , 2007, The Journal of chemical physics.

[17]  Wilhelm Huisinga,et al.  From simulation data to conformational ensembles: Structure and dynamics‐based methods , 1999, J. Comput. Chem..

[18]  S. Baumann,et al.  Influence of thiostrepton binding on the ribosomal GTPase associated region characterized by molecular dynamics simulation. , 2012, Bioorganic & medicinal chemistry.

[19]  Elena Papaleo,et al.  Free-energy landscape, principal component analysis, and structural clustering to identify representative conformations from molecular dynamics simulations: the myoglobin case. , 2009, Journal of molecular graphics & modelling.

[20]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[21]  Wolfgang Wintermeyer,et al.  GTPase activation of elongation factors Tu and G on the ribosome. , 2002, Biochemistry.

[22]  Garegin A Papoian,et al.  Deconstructing the native state: energy landscapes, function, and dynamics of globular proteins. , 2009, The journal of physical chemistry. B.

[23]  J. Frank,et al.  Functional conformations of the L11-ribosomal RNA complex revealed by correlative analysis of cryo-EM and molecular dynamics simulations. , 2006, RNA.

[24]  Oliver F. Lange,et al.  Can principal components yield a dimension reduced description of protein dynamics on long time scales? , 2006, The journal of physical chemistry. B.

[25]  B. L. de Groot,et al.  Essential dynamics of reversible peptide folding: memory-free conformational dynamics governed by internal hydrogen bonds. , 2001, Journal of molecular biology.

[26]  Fionn Murtagh,et al.  Multidimensional clustering algorithms , 1985 .

[27]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[28]  R. Hegger,et al.  Dihedral angle principal component analysis of molecular dynamics simulations. , 2007, The Journal of chemical physics.

[29]  J B Findlay,et al.  Protein dynamics derived from clusters of crystal structures. , 1997, Biophysical journal.

[30]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[31]  Holger Gohlke,et al.  The Amber biomolecular simulation programs , 2005, J. Comput. Chem..

[32]  P. Nguyen,et al.  Energy landscape of a small peptide revealed by dihedral angle principal component analysis , 2004, Proteins.

[33]  Gerhard Stock,et al.  Construction of the free energy landscape of biomolecules via dihedral angle principal component analysis. , 2008, The Journal of chemical physics.

[34]  G. Ciccotti,et al.  Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes , 1977 .

[35]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[36]  A. Palazoglu,et al.  Folding and unfolding characteristics of short beta strand peptides under different environmental conditions and starting configurations. , 2010, Biochimica et biophysica acta.

[37]  B. Hess Convergence of sampling in protein simulations. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  A. Caflisch,et al.  Kinetic analysis of molecular dynamics simulations reveals changes in the denatured state and switch of folding pathways upon single‐point mutation of a β‐sheet miniprotein , 2008, Proteins.

[39]  Hermann Strategies for the Design of Drugs Targeting RNA and RNA-Protein Complexes. , 2000, Angewandte Chemie.

[40]  Yong-Gui Gao,et al.  The Structure of the Ribosome with Elongation Factor G Trapped in the Posttranslocational State , 2009, Science.

[41]  D. Draper,et al.  The structure of free L11 and functional dynamics of L11 in free, L11-rRNA(58 nt) binary and L11-rRNA(58 nt)-thiostrepton ternary complexes. , 2007, Journal of Molecular Biology.

[42]  Martin Zacharias,et al.  Efficient evaluation of sampling quality of molecular dynamics simulations by clustering of dihedral torsion angles and Sammon mapping , 2009, J. Comput. Chem..

[43]  M. Cobb,et al.  Thiostrepton inhibits stable 70S ribosome binding and ribosome-dependent GTPase activation of elongation factor G and elongation factor 4 , 2011, Nucleic acids research.

[44]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[45]  J. Šponer,et al.  Refinement of the AMBER Force Field for Nucleic Acids: Improving the Description of α/γ Conformers , 2007 .

[46]  Daniel Svozil,et al.  Refinement of the AMBER force field for nucleic acids: improving the description of alpha/gamma conformers. , 2007, Biophysical journal.

[47]  Leo S. D. Caves,et al.  Bio3d: An R Package , 2022 .

[48]  Marina V. Rodnina,et al.  Structural Basis for the Function of the Ribosomal L7/12 Stalk in Factor Binding and GTPase Activation , 2005, Cell.

[49]  Frank Schluenzen,et al.  Translational regulation via L11: molecular switches on the ribosome turned on and off by thiostrepton and micrococcin. , 2008, Molecular cell.

[50]  S. Douthwaite,et al.  The antibiotics micrococcin and thiostrepton interact directly with 23S rRNA nucleotides 1067A and 1095A. , 1994, Nucleic acids research.

[51]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[52]  Steven Hayward,et al.  Normal modes and essential dynamics. , 2008, Methods in molecular biology.

[53]  R. Garrett,et al.  The antibiotic thiostrepton inhibits a functional transition within protein L11 at the ribosomal GTPase centre. , 1998, Journal of molecular biology.

[54]  P. Deuflhard,et al.  Identification of almost invariant aggregates in reversible nearly uncoupled Markov chains , 2000 .

[55]  S. Baumann,et al.  Mapping the binding site of thiopeptide antibiotics by proximity-induced covalent capture. , 2008, Journal of the American Chemical Society.

[56]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[57]  S. Baumann,et al.  A Fluorescent Probe for the 70 S‐Ribosomal GTPase‐Associated Center , 2009, Chembiochem : a European journal of chemical biology.

[58]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[59]  Leo S. D. Caves,et al.  The physical determinants of the DNA conformational landscape: an analysis of the potential energy surface of single-strand dinucleotides in the conformational space of duplex DNA , 2005, Nucleic acids research.

[60]  B. L. de Groot,et al.  Mapping the Conformational Dynamics and Pathways of Spontaneous Steric Zipper Peptide Oligomerization , 2011, PloS one.

[61]  Thomas Hermann,et al.  Drugs targeting the ribosome. , 2005, Current opinion in structural biology.

[62]  K. Dill,et al.  Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[63]  Wilfred F van Gunsteren,et al.  Comparing geometric and kinetic cluster algorithms for molecular simulation data. , 2010, The Journal of chemical physics.

[64]  Arvind Ramanathan,et al.  QAARM: quasi-anharmonic autoregressive model reveals molecular recognition pathways in ubiquitin , 2011, Bioinform..

[65]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[66]  H. Berendsen,et al.  Essential dynamics of the cellular retinol-binding protein--evidence for ligand-induced conformational changes. , 1995, Protein engineering.

[67]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[68]  V. Hornak,et al.  Comparison of multiple Amber force fields and development of improved protein backbone parameters , 2006, Proteins.

[69]  Jianyin Shao,et al.  Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms. , 2007, Journal of chemical theory and computation.