Modified FDP cluster algorithm and its application in protein conformation clustering analysis

Abstract We present a modified find density peaks (MFDP) clustering algorithm. In the MFDP, a critical parameter, dc, is auto-defined by minimizing the entropy of all points. By considering both the point density, ρ, and large distance from points with higher densities, δ, the high-dimensional points are transformed into a 2D space. The halo points of the original FDP cluster algorithm are redefined, and a definition of boundary points is introduced to illustrate the intersection region between clusters. To demonstrate the clustering ability, the distance-based K-means clustering and density-based algorithms DBSCAN, original FDP are employed respectively. Four criteria are introduced to evaluate the clustering algorithms quantitatively. For most of the cases, the MFDP provides a superior clustering result than both of the typical clustering algorithms, and FDP in 20 commonly used benchmark datasets, particularly in clearly depicting the intersection region between clusters. Finally, we evaluate the performance of the MFDP in the cluster analysis of conformations in molecular dynamics (MD). In the MD clustering process, eight typical cluster center conformations are selected in six collective variable spaces. Moreover, it is in strong agreement with the experiment results. The clustering results demonstrate the potential for generalized applications of the modified algorithm to similar problems.

[1]  Yuping Wang,et al.  A New Weight Based Density Peaks Clustering Algorithm for Numerical and Categorical Data , 2017, 2017 13th International Conference on Computational Intelligence and Security (CIS).

[2]  T. Darden,et al.  Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems , 1993 .

[3]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[4]  A. Laio,et al.  Efficient reconstruction of complex free energy landscapes by multiple walkers metadynamics. , 2006, The journal of physical chemistry. B.

[5]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[6]  Shuliang Wang,et al.  Clustering by Fast Search and Find of Density Peaks with Data Field , 2016 .

[7]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[8]  Alessandro Laio,et al.  A Kinetic Model of Trp-Cage Folding from Multiple Biased Molecular Dynamics Simulations , 2009, PLoS Comput. Biol..

[9]  Guoyin Wang,et al.  Multi-granularity Intelligent Information Processing , 2015, RSFDGrC.

[10]  Xiaogang Deng,et al.  Multimode non‐Gaussian process monitoring based on local entropy independent component analysis , 2017 .

[11]  Song Liu,et al.  Adaptive partitioning by local density‐peaks: An efficient density‐based clustering algorithm for analyzing molecular dynamics trajectories , 2017, J. Comput. Chem..

[12]  P. Kollman,et al.  Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models , 1992 .

[13]  Hai Le Vu,et al.  Partitioning road networks using density peak graphs: Efficiency vs. accuracy , 2017, Inf. Syst..

[14]  Peter W. Tse,et al.  An intelligent and improved density and distance-based clustering approach for industrial survey data classification , 2017, Expert Syst. Appl..

[15]  Anil K. Jain,et al.  Data Clustering: A User's Dilemma , 2005, PReMI.

[16]  J. Berg,et al.  Molecular dynamics simulations of biomolecules , 2002, Nature Structural Biology.

[17]  Gerardo Mendizabal-Ruiz,et al.  Genomic signal processing for DNA sequence clustering , 2018, PeerJ.

[18]  Pasi Fränti,et al.  Iterative shrinking method for clustering problems , 2006, Pattern Recognit..

[19]  Peilin Yang,et al.  An overlapping community detection algorithm based on density peaks , 2017, Neurocomputing.

[20]  Patrick Siarry,et al.  Improved spatial fuzzy c-means clustering for image segmentation using PSO initialization, Mahalanobis distance and post-segmentation correction , 2013, Digit. Signal Process..

[21]  J. Adjaye,et al.  Human pluripotent stem cell derived HLC transcriptome data enables molecular dissection of hepatogenesis , 2018, Scientific Data.

[22]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[23]  Julien Jacques,et al.  Functional data clustering: a survey , 2013, Advances in Data Analysis and Classification.

[24]  Pasi Fränti,et al.  Set Matching Measures for External Cluster Validity , 2016, IEEE Transactions on Knowledge and Data Engineering.

[25]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[26]  Alex Rodriguez,et al.  METAGUI 3: A graphical user interface for choosing the collective variables in molecular dynamics simulations , 2017, Comput. Phys. Commun..

[27]  Cor J. Veenman,et al.  A Maximum Variance Cluster Algorithm , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Brendan J. Frey,et al.  Non-metric affinity propagation for unsupervised image categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[29]  P. Kollman,et al.  How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? , 2000 .

[30]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[31]  H. Berendsen,et al.  Molecular dynamics with coupling to an external bath , 1984 .

[32]  Xin Gao,et al.  K-nearest uphill clustering in the protein structure space , 2017, Neurocomputing.

[33]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[34]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[35]  Berk Hess,et al.  LINCS: A linear constraint solver for molecular simulations , 1997 .

[36]  Qingquan Li,et al.  A Novel Ranking-Based Clustering Approach for Hyperspectral Band Selection , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[37]  A. Laio,et al.  Assessing the accuracy of metadynamics. , 2005, The journal of physical chemistry. B.

[38]  Jane Cleland-Huang,et al.  A consensus based approach to constrained clustering of software requirements , 2008, CIKM '08.

[39]  Shuliang Wang,et al.  Data field for mining big data , 2016, Geo spatial Inf. Sci..

[40]  Hong Jiang,et al.  SCMDOT: Spatial Clustering with Multiple Density-Ordered Trees , 2017, ISPRS Int. J. Geo Inf..

[41]  Hongtao Yu,et al.  Toward structure prediction of cyclic peptides. , 2015, Physical chemistry chemical physics : PCCP.

[42]  Jörg Sander Density-Based Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[43]  F. Howari,et al.  Statistical analysis and estimation of the regional trend of aerosol size over the Arabian Gulf Region during 2002–2016 , 2018, Scientific Reports.

[44]  V. Torre,et al.  A structural, functional, and computational analysis suggests pore flexibility as the base for the poor selectivity of CNG channels , 2015, Proceedings of the National Academy of Sciences.

[45]  David S. Goodsell,et al.  The RCSB protein data bank: integrative view of protein, gene and 3D structural information , 2016, Nucleic Acids Res..

[46]  Asoke K. Nandi,et al.  Integrative Cluster Analysis in Bioinformatics , 2015 .

[47]  Cheng Wang,et al.  An improved Wang-Mendel method based on the FSFDP clustering algorithm and sample correlation , 2016, J. Intell. Fuzzy Syst..

[48]  Yongchuan Tang,et al.  Comparative density peaks clustering , 2018, Expert Syst. Appl..

[49]  Licheng Jiao,et al.  Spectral clustering with fuzzy similarity measure , 2011, Digit. Signal Process..

[50]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[51]  Julian Szymanski,et al.  Spectral Clustering Wikipedia Keyword-Based Search Results , 2017, Front. Robot. AI.

[52]  Alfred O. Hero,et al.  Graph based k-means clustering , 2012, Signal Process..

[53]  Momiao Xiong,et al.  Nuclear Norm Clustering: a promising alternative method for clustering tasks , 2018, Scientific Reports.

[54]  Bülent Sankur,et al.  Probabilistic sequence clustering with spectral learning , 2014, Digit. Signal Process..

[55]  Friedhelm Schwenker,et al.  Parallelized Kernel Patch Clustering , 2010, ANNPR.

[56]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[57]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[58]  Eric Vanden-Eijnden Transition Path Theory , 2006 .

[59]  Pei Chen,et al.  Delta-density based clustering with a divide-and-conquer strategy: 3DC clustering , 2016, Pattern Recognit. Lett..

[60]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[61]  Bo Wu,et al.  A Fast Density and Grid Based Clustering Method for Data With Arbitrary Shapes and Noise , 2017, IEEE Transactions on Industrial Informatics.

[62]  Xuesong Wang,et al.  Clustering-Based Geometrical Structure Retrieval of Man-Made Target in SAR Images , 2017, IEEE Geoscience and Remote Sensing Letters.

[63]  F. Sheong,et al.  Constructing Kinetic Network Models to Elucidate Mechanisms of Functional Conformational Changes of Enzymes and Their Recognition with Ligands. , 2016, Methods in enzymology.

[64]  Cordelia Schmid,et al.  High-dimensional data clustering , 2006, Comput. Stat. Data Anal..

[65]  T. Darden,et al.  A smooth particle mesh Ewald method , 1995 .

[66]  Lawrence O. Hall,et al.  Objective function‐based clustering , 2012, WIREs Data Mining Knowl. Discov..

[67]  Taku Kudo,et al.  Clustering graphs by weighted substructure mining , 2006, ICML.

[68]  Berk Hess,et al.  GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .

[69]  Xin Lu,et al.  Spatial clustering with Density-Ordered tree , 2016 .

[70]  Hongjie Jia,et al.  The latest research progress on spectral clustering , 2013, Neural Computing and Applications.

[71]  Jia Liu,et al.  Outlier detection based on local minima density , 2016, 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference.

[72]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[73]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[74]  Nizar Bouguila,et al.  Model-based approach for high-dimensional non-Gaussian visual data clustering and feature weighting , 2015, Digit. Signal Process..

[75]  Mario Lemmer,et al.  Unsupervised vector-based classification of single-molecule charge transport data , 2016, Nature Communications.

[76]  Zhe Zhang,et al.  Improved K-Means Clustering Algorithm , 2008, 2008 Congress on Image and Signal Processing.

[77]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[78]  Zhijie Wen,et al.  Locality-constrained nonnegative robust shape interaction subspace clustering and its applications , 2017, Digit. Signal Process..

[79]  Peter M. Kasson,et al.  GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit , 2013, Bioinform..

[80]  Chia-Hung Lin,et al.  Fractal QRS-complexes pattern recognition for imperative cardiac arrhythmias , 2010, Digit. Signal Process..

[81]  Friedhelm Schwenker,et al.  Clustering large datasets with kernel methods , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[82]  Xia Li Wang,et al.  Enhancing minimum spanning tree-based clustering by removing density-based outliers , 2013, Digit. Signal Process..

[83]  Zhi Wang,et al.  Tinker‐OpenMM: Absolute and relative alchemical free energies using AMOEBA on GPUs , 2017, J. Comput. Chem..

[84]  Robert P. W. Duin,et al.  Selecting feature lines in generalized dissimilarity representations for pattern recognition , 2013, Digit. Signal Process..

[85]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[86]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[87]  Andrew Y. Ng,et al.  Learning Feature Representations with K-Means , 2012, Neural Networks: Tricks of the Trade.

[88]  Hongtao Yu,et al.  Insights into How Cyclic Peptides Switch Conformations. , 2016, Journal of chemical theory and computation.

[89]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[90]  Ming Zhong,et al.  DAPPFC: Density-Based Affinity Propagation for Parameter Free Clustering , 2016, ADMA.