An improved density peaks clustering algorithm with fast finding cluster centers

Abstract Fast and efficient are common requirements for all clustering algorithms. Density peaks clustering algorithm (DPC) can deal with non-spherical clusters well. However, due to the difficulty of large-scale data set storage and its high computational complexity, how to conduct effective data mining has become a challenge. To address this issue, we propose an improved density peaks clustering algorithm with fast finding cluster centers, which improves the efficiency of DPC algorithm by screening points with higher local density based on two novel prescreening strategies. The first strategy is based on the grid-division (GDPC), which screens points according to the density of corresponding grid cells. The second strategy is based on the circle-division (CDPC), which screens the points according to the uneven distribution of data sets in the corresponding circles. Theoretical analysis and experimental results show that both the prescreening strategies can reduce the calculation complexity, and the proposed algorithm not only more satisfied than DPC algorithm, but also superior than well-known Nystrom-SC algorithm on the large-scale data sets. Moreover, due to the different theories of the two prescreening strategies, the first strategy is faster and the second strategy is more accurate on the large-scale data sets.

[1]  Hao Zhang,et al.  Improvement of distributed clustering algorithm based on min-cluster , 2016 .

[2]  Michele Luvisotto,et al.  Distributed Clustering Strategies in Industrial Wireless Sensor Networks , 2017, IEEE Transactions on Industrial Informatics.

[3]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[4]  Rongfang Bie,et al.  Clustering by fast search and find of density peaks via heat diffusion , 2016, Neurocomputing.

[5]  Hongjie Jia,et al.  Research of semi-supervised spectral clustering algorithm based on pairwise constraints , 2012, Neural Computing and Applications.

[6]  Weixin Xie,et al.  Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[7]  Qin Ma,et al.  A link density clustering algorithm based on automatically selecting density peaks for overlapping community detection , 2016 .

[8]  Sungzoon Cho,et al.  Fast Pattern Selection for Support Vector Classifiers , 2002, PAKDD.

[9]  Xie Juan-ying,et al.  K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset , 2016 .

[10]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[11]  Jian Hou,et al.  Experimental evaluation of a density kernel in clustering , 2016, 2016 Seventh International Conference on Intelligent Control and Information Processing (ICICIP).

[12]  Xiao Xu,et al.  An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood , 2017, Knowl. Based Syst..

[13]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[14]  Gong Shufeng Zhang Yanfeng,et al.  EDDPC: An Efficient Distributed Density Peaks Clustering Algorithm , 2016 .

[15]  Yike Guo,et al.  Fast density clustering strategies based on the k-means algorithm , 2017, Pattern Recognit..

[16]  Hai Le Vu,et al.  Partitioning road networks using density peak graphs: Efficiency vs. accuracy , 2017, Inf. Syst..

[17]  Claire Monteleoni,et al.  Exploiting sparsity to improve the accuracy of Nyström-based large-scale spectral clustering , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[18]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[19]  Paul D. McNicholas,et al.  Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures , 2013, Comput. Stat. Data Anal..

[20]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Hongjie Jia,et al.  Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..

[22]  Xianchao Zhang,et al.  Sampling for Nyström Extension-Based Spectral Clustering , 2016, ACM Trans. Knowl. Discov. Data.

[23]  Guoyin Wang,et al.  DenPEHC: Density peak based efficient hierarchical clustering , 2016, Inf. Sci..

[24]  Han Qi,et al.  A new method to estimate ages of facial image for large database , 2015, Multimedia Tools and Applications.