Adaptive partitioning by local density‐peaks: An efficient density‐based clustering algorithm for analyzing molecular dynamics trajectories

We present an efficient density‐based adaptive‐resolution clustering method APLoD for analyzing large‐scale molecular dynamics (MD) trajectories. APLoD performs the k‐nearest‐neighbors search to estimate the density of MD conformations in a local fashion, which can group MD conformations in the same high‐density region into a cluster. APLoD greatly improves the popular density peaks algorithm by reducing the running time and the memory usage by 2–3 orders of magnitude for systems ranging from alanine dipeptide to a 370‐residue Maltose‐binding protein. In addition, we demonstrate that APLoD can produce clusters with various sizes that are adaptive to the underlying density (i.e., larger clusters at low‐density regions, while smaller clusters at high‐density regions), which is a clear advantage over other popular clustering algorithms including k‐centers and k‐medoids. We anticipate that APLoD can be widely applied to split ultra‐large MD datasets containing millions of conformations for subsequent construction of Markov State Models. © 2016 Wiley Periodicals, Inc.

[1]  Xuhui Huang,et al.  Elucidation of the Dynamics of Transcription Elongation by RNA Polymerase II using Kinetic Network Models. , 2016, Accounts of chemical research.

[2]  Vincent A Voelz,et al.  Taming the complexity of protein folding. , 2011, Current opinion in structural biology.

[3]  Fu Kit Sheong,et al.  A fast parallel clustering algorithm for molecular simulation trajectories , 2013, J. Comput. Chem..

[4]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[5]  R. Levy,et al.  Simple continuous and discrete models for simulating replica exchange simulations of protein folding. , 2008, The journal of physical chemistry. B.

[6]  Shuo Gu,et al.  Quantitatively Characterizing the Ligand Binding Mechanisms of Choline Binding Protein Using Markov State Model Analysis , 2014, PLoS Comput. Biol..

[7]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[8]  Frank Noé,et al.  Markov models of molecular kinetics: generation and validation. , 2011, The Journal of chemical physics.

[9]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[10]  F. Noé,et al.  Kinetic distance and kinetic maps from molecular dynamics simulation. , 2015, Journal of chemical theory and computation.

[11]  Xin Gao,et al.  Markov State Models Reveal a Two-Step Mechanism of miRNA Loading into the Human Argonaute Protein: Selective Binding followed by Structural Re-arrangement , 2015, PLoS Comput. Biol..

[12]  G. de Fabritiis,et al.  Complete reconstruction of an enzyme-inhibitor binding process by molecular dynamics simulations , 2011, Proceedings of the National Academy of Sciences.

[13]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[14]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[15]  Dahlia R. Weiss,et al.  Millisecond dynamics of RNA polymerase II translocation at atomic resolution , 2014, Proceedings of the National Academy of Sciences.

[16]  M. Karplus,et al.  Internal motions of antibody molecules , 1977, Nature.

[17]  Vijay S Pande,et al.  Enhanced modeling via network theory: Adaptive sampling of Markov state models. , 2010, Journal of chemical theory and computation.

[18]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[19]  G. Hummer,et al.  Coarse master equations for peptide folding dynamics. , 2008, The journal of physical chemistry. B.

[20]  Yuan Yao,et al.  Hierarchical Nyström methods for constructing Markov state models for conformational dynamics. , 2013, The Journal of chemical physics.

[21]  Vijay S. Pande,et al.  Everything you wanted to know about Markov State Models but were afraid to ask. , 2010, Methods.

[22]  D. Coomans,et al.  Alternative k-nearest neighbour rules in supervised pattern recognition : Part 1. k-Nearest neighbour classification by using alternative voting rules , 1982 .

[23]  V. Pande,et al.  Error analysis and efficient sampling in Markovian state models for molecular dynamics. , 2005, The Journal of chemical physics.

[24]  Fu Kit Sheong,et al.  Automatic state partitioning for multibody systems (APM): an efficient algorithm for constructing Markov state models to elucidate conformational dynamics of multibody systems. , 2015, Journal of chemical theory and computation.

[25]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[26]  Frank Noé,et al.  Variational Approach to Molecular Kinetics. , 2014, Journal of chemical theory and computation.

[27]  Jesús A. Izaguirre,et al.  Modeling Conformational Ensembles of Slow Functional Motions in Pin1-WW , 2010, PLoS Comput. Biol..

[28]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[29]  Florian Sittel,et al.  Robust Density-Based Clustering To Identify Metastable Conformational States of Proteins. , 2016, Journal of chemical theory and computation.

[30]  M. Levitt,et al.  Molecular dynamics of native protein. I. Computer simulation of trajectories. , 1983, Journal of molecular biology.

[31]  Wilfred F van Gunsteren,et al.  Comparing geometric and kinetic cluster algorithms for molecular simulation data. , 2010, The Journal of chemical physics.

[32]  V. Pande,et al.  Rapid equilibrium sampling initiated from nonequilibrium data , 2009, Proceedings of the National Academy of Sciences.

[33]  D. Case,et al.  Exploring protein native states and large‐scale conformational changes with a modified generalized born model , 2004, Proteins.

[34]  Vincent A Voelz,et al.  Surprisal Metrics for Quantifying Perturbed Conformational Dynamics in Markov State Models. , 2014, Journal of chemical theory and computation.

[35]  Frank Noé,et al.  Markov state models of biomolecular conformational dynamics. , 2014, Current opinion in structural biology.

[36]  K. Dill,et al.  Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[37]  Xuhui Huang,et al.  Using generalized ensemble simulations and Markov state models to identify conformational states. , 2009, Methods.

[38]  F. Noé,et al.  A Basis Set for Peptides for the Variational Approach to Conformational Kinetics. , 2015, Journal of chemical theory and computation.

[39]  Albert C. Pan,et al.  Building Markov state models along pathways to determine free energies and rates of transitions. , 2008, The Journal of chemical physics.

[40]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[41]  Dong Wang,et al.  A Two-State Model for the Dynamics of the Pyrophosphate Ion Release in Bacterial RNA Polymerase , 2013, PLoS Comput. Biol..

[42]  Rommie E. Amaro,et al.  Application of Molecular-Dynamics Based Markov State Models to Functional Proteins , 2014, Journal of chemical theory and computation.

[43]  Daniel‐Adriano Silva,et al.  Bridge helix bending promotes RNA polymerase II backtracking through a critical and conserved threonine residue , 2016, Nature Communications.

[44]  P. Deuflhard,et al.  Identification of almost invariant aggregates in reversible nearly uncoupled Markov chains , 2000 .

[45]  Gregory R Bowman,et al.  FAST Conformational Searches by Balancing Exploration/Exploitation Trade-Offs. , 2015, Journal of chemical theory and computation.

[46]  V. Hornak,et al.  Comparison of multiple Amber force fields and development of improved protein backbone parameters , 2006, Proteins.