Clustering algorithms applied on analysis of protein molecular dynamics

Analysis of molecular dynamic (MD) simulation has been difficult since this method generates a lot of conformations. Thus clustering algorithms have been applied to group similar structures from MD simulations, but the choice of the information to be clustered is still a challenge. In this work, we propose the use of Euclidean distance matrices (EDM) from conformations as input data to clustering algorithms. We used approaches combining non-reduction or reduction of data dimensionality (MDS and isomap methods), and different clustering algorithms (k-means, ward, mean-shift and affinity propagation). Results indicated that EDM could be a good information to be used in clustering conformations from MD. For data with small protein structure variation, the mean-shift algorithm had good results in both non-reduced and reduced data. However, for data with large protein structure variation, the methods that work better with smooth-density data (k-means and ward) had good results.

[1]  Elena Papaleo,et al.  Free-energy landscape, principal component analysis, and structural clustering to identify representative conformations from molecular dynamics simulations: the myoglobin case. , 2009, Journal of molecular graphics & modelling.

[2]  Osmar Norberto de Souza,et al.  A strategic solution to optimize molecular docking simulations using Fully-Flexible Receptor models , 2014, Expert Syst. Appl..

[3]  Shawn D. Newsam,et al.  Validating clustering of molecular dynamics simulations using polymer models , 2011, BMC Bioinformatics.

[4]  J. Berg,et al.  Molecular dynamics simulations of biomolecules , 2002, Nature Structural Biology.

[5]  F E Cohen,et al.  Protein conformational landscapes: Energy minimization and clustering of a long molecular dynamics trajectory , 1995, Proteins.

[6]  Rodrigo C. Barros,et al.  Clustering Molecular Dynamics Trajectories for Optimizing Docking Experiments , 2015, Comput. Intell. Neurosci..

[7]  Ali Ghodsi,et al.  Dimensionality Reduction A Short Tutorial , 2006 .

[8]  Jianyin Shao,et al.  Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms. , 2007, Journal of chemical theory and computation.

[9]  T. Soni Madhulatha,et al.  An Overview on Clustering Methods , 2012, ArXiv.

[10]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, CVPR.

[11]  Antje Wolf,et al.  Principal component and clustering analysis on molecular dynamics data of the ribosomal L11·23S subdomain , 2012, Journal of Molecular Modeling.

[12]  Martin Vetterli,et al.  Euclidean Distance Matrices: Essential theory, algorithms, and applications , 2015, IEEE Signal Processing Magazine.

[13]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Andrew E. Torda,et al.  Algorithms for clustering molecular dynamics configurations , 1994, J. Comput. Chem..

[15]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[16]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[17]  Duncan D. A. Ruiz,et al.  wFReDoW: A Cloud-Based Web Environment to Handle Molecular Docking Simulations of a Fully Flexible Receptor Model , 2013, BioMed research international.

[18]  Daniel R Roe,et al.  PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data. , 2013, Journal of chemical theory and computation.

[19]  Shraddha K. Popat Review and Comparative Study of Clustering Techniques , 2014 .

[20]  Leonardo L. G. Ferreira,et al.  Molecular Docking and Structure-Based Drug Design Strategies , 2015, Molecules.

[21]  Daniel M Zuckerman,et al.  Ensemble-based convergence analysis of biomolecular trajectories. , 2006, Biophysical journal.