Cluster analysis of molecular simulation trajectories for systems where both conformation and orientation of the sampled states are important

Clustering methods have been widely used to group together similar conformational states from molecular simulations of biomolecules in solution. For applications such as the interaction of a protein with a surface, the orientation of the protein relative to the surface is also an important clustering parameter because of its potential effect on adsorbed‐state bioactivity. This study presents cluster analysis methods that are specifically designed for systems where both molecular orientation and conformation are important, and the methods are demonstrated using test cases of adsorbed proteins for validation. Additionally, because cluster analysis can be a very subjective process, an objective procedure for identifying both the optimal number of clusters and the best clustering algorithm to be applied to analyze a given dataset is presented. The method is demonstrated for several agglomerative hierarchical clustering algorithms used in conjunction with three cluster validation techniques. © 2016 Wiley Periodicals, Inc.

[1]  R. Wade,et al.  Brownian dynamics simulation of protein solutions: structural and dynamical properties. , 2010, Biophysical journal.

[2]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[3]  M Karplus,et al.  Active site dynamics of ribonuclease. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[4]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[5]  Elena Papaleo,et al.  Free-energy landscape, principal component analysis, and structural clustering to identify representative conformations from molecular dynamics simulations: the myoglobin case. , 2009, Journal of molecular graphics & modelling.

[6]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[7]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[8]  X. Daura,et al.  Peptide Folding: When Simulation Meets Experiment , 1999 .

[9]  L. Nilsson,et al.  Molecular dynamics of the anticodon domain of yeast tRNA(Phe): codon-anticodon interaction. , 2000, Biophysical journal.

[10]  C. Brooks,et al.  Statistical clustering techniques for the analysis of long molecular dynamics trajectories: analysis of 2.2-ns trajectories of YPGDV. , 1993, Biochemistry.

[11]  R L Somorjai,et al.  Fuzzy cluster analysis of molecular dynamics trajectories , 1992, Proteins.

[12]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[13]  Kazuyuki Akasaka,et al.  Pressure-dependent changes in the solution structure of hen egg-white lysozyme. , 2003, Journal of molecular biology.

[14]  C. Brooks,et al.  First-principles calculation of the folding free energy of a three-helix bundle protein. , 1995, Science.

[15]  David S. Moss,et al.  Comparison of Two Independently Refined Models of Ribonuclease-A , 1986 .

[16]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[17]  Martin Zacharias,et al.  Folding simulations of Trp‐cage mini protein in explicit solvent using biasing potential replica‐exchange molecular dynamics simulations , 2009, Proteins.

[18]  C. Brooks,et al.  Protein Dynamics in Enzymatic Catalysis: Exploration of Dihydrofolate Reductase , 2000 .

[19]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[20]  Jianyin Shao,et al.  Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms. , 2007, Journal of chemical theory and computation.

[21]  B. Everitt,et al.  Cluster Analysis: Low Temperatures and Voting in Congress , 2001 .

[22]  J. Goodfellow,et al.  Simulations of human lysozyme: probing the conformations triggering amyloidosis. , 2003, Biophysical journal.

[23]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[25]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[26]  J L Sussman,et al.  A 3D building blocks approach to analyzing and predicting structure of proteins , 1989, Proteins.

[27]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[28]  Ersin Emre Oren,et al.  Probing the molecular mechanisms of quartz-binding peptides. , 2010, Langmuir : the ACS journal of surfaces and colloids.

[29]  Peter S. Shenkin,et al.  Cluster analysis of molecular conformations , 1994, J. Comput. Chem..

[30]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[31]  Adam Liwo,et al.  Protein-folding dynamics: overview of molecular simulation techniques. , 2007, Annual review of physical chemistry.

[32]  Xianggui Qu,et al.  Multivariate Data Analysis , 2007, Technometrics.

[33]  A. Pandini,et al.  Detection of allosteric signal transmission by information-theoretic analysis of protein dynamics , 2012, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[34]  V. Pande,et al.  Multiplexed-replica exchange molecular dynamics method for protein folding simulation. , 2003, Biophysical journal.

[35]  Catherine Jeandenans,et al.  Multiconformational Investigations of Polypeptidic Structures, Using Clustering Methods and Principal Components Analysis , 1993, Comput. Chem..

[36]  M. Karplus,et al.  Hidden complexity of free energy surfaces for peptide (protein) folding. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Robert A Latour,et al.  TIGER2 with solvent energy averaging (TIGER2A): An accelerated sampling method for large molecular systems with explicit representation of solvent. , 2015, The Journal of chemical physics.

[38]  K. Dill,et al.  Using quaternions to calculate RMSD , 2004, J. Comput. Chem..

[39]  Christopher Bystroff,et al.  Helix propensities of short peptides: Molecular dynamics versus bioinformatics , 2003, Proteins.

[40]  F E Cohen,et al.  Protein conformational landscapes: Energy minimization and clustering of a long molecular dynamics trajectory , 1995, Proteins.

[41]  Cândida G. Silva,et al.  Potentially amyloidogenic conformational intermediates populate the unfolding landscape of transthyretin: Insights from molecular dynamics simulations , 2010, Protein science : a publication of the Protein Society.

[42]  Francesco Rao,et al.  Protein dynamics investigated by inherent structure analysis , 2010, Proceedings of the National Academy of Sciences.

[43]  Andrew E. Torda,et al.  Algorithms for clustering molecular dynamics configurations , 1994, J. Comput. Chem..

[44]  B. Roux,et al.  Molecular dynamics simulation of the gramicidin channel in a phospholipid bilayer. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[45]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .