A fast parallel clustering algorithm for molecular simulation trajectories

We implemented a GPU‐powered parallel k‐centers algorithm to perform clustering on the conformations of molecular dynamics (MD) simulations. The algorithm is up to two orders of magnitude faster than the CPU implementation. We tested our algorithm on four protein MD simulation datasets ranging from the small Alanine Dipeptide to a 370‐residue Maltose Binding Protein (MBP). It is capable of grouping 250,000 conformations of the MBP into 4000 clusters within 40 seconds. To achieve this, we effectively parallelized the code on the GPU and utilize the triangle inequality of metric spaces. Furthermore, the algorithm's running time is linear with respect to the number of cluster centers. In addition, we found the triangle inequality to be less effective in higher dimensions and provide a mathematical rationale. Finally, using Alanine Dipeptide as an example, we show a strong correlation between cluster populations resulting from the k‐centers algorithm and the underlying density. © 2012 Wiley Periodicals, Inc.

[1]  Kyle A. Beauchamp,et al.  Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39). , 2010, Journal of the American Chemical Society.

[2]  Thomas J Lane,et al.  MSMBuilder2: Modeling Conformational Dynamics at the Picosecond to Millisecond Scale. , 2011, Journal of chemical theory and computation.

[3]  Daniel-Adriano Silva,et al.  A Role for Both Conformational Selection and Induced Fit in Ligand Binding by the LAO Protein , 2011, PLoS Comput. Biol..

[4]  Vijay S. Pande,et al.  Accelerating molecular dynamic simulation on graphics processing units , 2009, J. Comput. Chem..

[5]  Jesús A. Izaguirre,et al.  Modeling Conformational Ensembles of Slow Functional Motions in Pin1-WW , 2010, PLoS Comput. Biol..

[6]  Russ B. Altman,et al.  CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms , 2011, Bioinform..

[7]  D. Case,et al.  Exploring protein native states and large‐scale conformational changes with a modified generalized born model , 2004, Proteins.

[8]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[9]  Vincent A Voelz,et al.  Taming the complexity of protein folding. , 2011, Current opinion in structural biology.

[10]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[11]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[12]  Wei Zhang,et al.  A point‐charge force field for molecular mechanics simulations of proteins based on condensed‐phase quantum mechanical calculations , 2003, J. Comput. Chem..

[13]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[14]  Jianyin Shao,et al.  Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms. , 2007, Journal of chemical theory and computation.

[15]  Wilfred F van Gunsteren,et al.  Biomolecular modeling: Goals, problems, perspectives. , 2006, Angewandte Chemie.

[16]  Ram Samudrala,et al.  GPU-Q-J, a fast method for calculating root mean square deviation (RMSD) after optimal superposition , 2011, BMC Research Notes.

[17]  Daniel‐Adriano Silva,et al.  Simulating the T-jump-triggered unfolding dynamics of trpzip2 peptide and its time-resolved IR and two-dimensional IR signals using the Markov state model approach. , 2011, The journal of physical chemistry. B.

[18]  Ruhong Zhou,et al.  Destruction of long-range interactions by a single mutation in lysozyme , 2007, Proceedings of the National Academy of Sciences.

[19]  Peter L. Freddolino,et al.  Ten-microsecond molecular dynamics simulation of a fast-folding WW domain. , 2008, Biophysical journal.

[20]  V. Hornak,et al.  Comparison of multiple Amber force fields and development of improved protein backbone parameters , 2006, Proteins.

[21]  A. Alexandrescu,et al.  Dynamic α-Helix Structure of Micelle-bound Human Amylin* , 2009, Journal of Biological Chemistry.

[22]  Xuhui Huang,et al.  Using generalized ensemble simulations and Markov state models to identify conformational states. , 2009, Methods.

[23]  Berk Hess,et al.  LINCS: A linear constraint solver for molecular simulations , 1997 .

[24]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[25]  F. Quiocho,et al.  Crystallographic evidence of a large ligand-induced hinge-twist motion between the two domains of the maltodextrin binding protein involved in active transport and chemotaxis. , 1992, Biochemistry.

[26]  K. Dill,et al.  Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[27]  C. Schütte,et al.  Supplementary Information for “ Constructing the Equilibrium Ensemble of Folding Pathways from Short Off-Equilibrium Simulations ” , 2009 .

[28]  Gerrit Groenhof,et al.  GROMACS: Fast, flexible, and free , 2005, J. Comput. Chem..

[29]  F. Noé,et al.  Transition networks for modeling the kinetics of conformational change in macromolecules. , 2008, Current opinion in structural biology.

[30]  V. Pande,et al.  Rapid equilibrium sampling initiated from nonequilibrium data , 2009, Proceedings of the National Academy of Sciences.

[31]  Lydia E Kavraki,et al.  Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction , 2006, Proc. Natl. Acad. Sci. USA.

[32]  Wendy R. Fox,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1991 .

[33]  X. Daura,et al.  Folding–unfolding thermodynamics of a β‐heptapeptide from equilibrium simulations , 1999, Proteins.

[34]  H. Berendsen,et al.  Molecular dynamics with coupling to an external bath , 1984 .

[35]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[36]  Vijay S Pande,et al.  Progress and challenges in the automated construction of Markov state models for full protein systems. , 2009, The Journal of chemical physics.

[37]  Mark J. Harris,et al.  Parallel Prefix Sum (Scan) with CUDA , 2011 .

[38]  T. Darden,et al.  A smooth particle mesh Ewald method , 1995 .

[39]  Michael Levitt,et al.  Clustering to identify RNA conformations constrained by secondary structure , 2011, Proceedings of the National Academy of Sciences.

[40]  John D. Chodera,et al.  Long-Time Protein Folding Dynamics from Short-Time Molecular Dynamics Simulations , 2006, Multiscale Model. Simul..

[41]  M. Parrinello,et al.  Canonical sampling through velocity rescaling. , 2007, The Journal of chemical physics.

[42]  G. de Fabritiis,et al.  Complete reconstruction of an enzyme-inhibitor binding process by molecular dynamics simulations , 2011, Proceedings of the National Academy of Sciences.

[43]  Anders Hast,et al.  Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop , 2009 .

[44]  B. Steipe,et al.  A revised proof of the metric properties of optimally superimposed vector sets. , 2002, Acta crystallographica. Section A, Foundations of crystallography.

[45]  H. C. Andersen Molecular dynamics simulations at constant pressure and/or temperature , 1980 .

[46]  Frank Noé,et al.  Markov models of molecular kinetics: generation and validation. , 2011, The Journal of chemical physics.

[47]  C. Brooks,et al.  Statistical clustering techniques for the analysis of long molecular dynamics trajectories: analysis of 2.2-ns trajectories of YPGDV. , 1993, Biochemistry.

[48]  Wilfred F van Gunsteren,et al.  Comparing geometric and kinetic cluster algorithms for molecular simulation data. , 2010, The Journal of chemical physics.

[49]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[50]  D. Theobald short communications Acta Crystallographica Section A Foundations of , 2005 .

[51]  Joseph A. Bank,et al.  Supporting Online Material Materials and Methods Figs. S1 to S10 Table S1 References Movies S1 to S3 Atomic-level Characterization of the Structural Dynamics of Proteins , 2022 .

[52]  Vijay S. Pande,et al.  Screen Savers of the World Unite! , 2000, Science.