Adaptive density peak clustering based on dimensional-free and reverse k-nearest neighbors

Cluster analysis plays a crucial component in consumer behavior segment. The density peak clustering algorithm (DPC) is a novel density-based clustering method. However, it performs poorly in high-dimension datasets and the local density for boundary points. In addition, its fault tolerance is affected by one-step allocation strategy. To overcome these disadvantages, an adaptive density peak clustering algorithm based on dimensional-free and reverse k-nearest neighbors (ERK-DPC) is proposed in this paper. First, we compute Euler cosine distance to obtain the similarity of sample points in high-dimension datasets. Then, the adaptive local density formula is used to measure the local density of each point. Finally, the reverse k-nearest neighbor idea is added on two-step allocation strategy, which assigns the remaining points accurately and effectively. The proposed clustering algorithm is experiments on several benchmark datasets and real-world datasets. By comparing the benchmarks, the results demonstrate that the ERK-DPC algorithm superior to some state-of- the-art methods.

[1]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[4]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[5]  Xiao Xu,et al.  Density peaks clustering using geodesic distances , 2017, International Journal of Machine Learning and Cybernetics.

[6]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[7]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[8]  Yifan Xu,et al.  Fast clustering using adaptive density peak detection , 2015, Statistical methods in medical research.

[9]  Peter Rossmanith,et al.  Exact algorithms for problems related to the densest k-set problem , 2014, Inf. Process. Lett..

[10]  Yunchuan Sun,et al.  Adaptive fuzzy clustering by fast search and find of density peaks , 2015, 2015 International Conference on Identification, Information, and Knowledge in the Internet of Things (IIKI).

[11]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[12]  Jun Li,et al.  An enhanced density peak-based clustering approach for hyperspectral band selection , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[13]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[14]  Paul D. McNicholas,et al.  Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures , 2016, Comput. Stat. Data Anal..

[15]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Hongjie Jia,et al.  Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..

[17]  SunYunchuan,et al.  Adaptive fuzzy clustering by fast search and find of density peaks , 2016 .

[18]  Xueying Zhang,et al.  Robust support vector data description for outlier detection with noise or uncertain data , 2015, Knowl. Based Syst..

[19]  Alain Bretto,et al.  A reductive approach to hypergraph clustering: An application to image segmentation , 2012, Pattern Recognit..

[20]  Peilin Yang,et al.  An overlapping community detection algorithm based on density peaks , 2017, Neurocomputing.

[21]  Yi Liu,et al.  Clustering Sentences with Density Peaks for Multi-document Summarization , 2015, NAACL.

[22]  Chao Deng,et al.  GRIDEN: An effective grid-based and density-based spatial clustering algorithm to support parallel computing , 2018, Pattern Recognit. Lett..

[23]  Shashi Shekhar,et al.  Clustering and Information Retrieval , 2011, Network Theory and Applications.

[24]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[25]  Petros Xanthopoulos,et al.  A robust unsupervised consensus control chart pattern recognition framework , 2015, Expert Syst. Appl..

[26]  Jun Liang,et al.  Constraint-based clustering by fast search and find of density peaks , 2019, Neurocomputing.

[27]  Qinbao Song,et al.  Automatic Clustering via Outward Statistical Testing on Density Metrics , 2016, IEEE Transactions on Knowledge and Data Engineering.

[28]  Weixin Xie,et al.  Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[29]  Parham Moradi,et al.  Dynamic graph-based label propagation for density peaks clustering , 2019, Expert Syst. Appl..

[30]  Vincent S. Tseng,et al.  A novel two-level clustering method for time series data analysis , 2010, Expert Syst. Appl..

[31]  Pasi Fränti,et al.  K-means properties on six clustering benchmark datasets , 2018, Applied Intelligence.

[32]  Marimuthu Palaniswami,et al.  Fuzzy c-Means Algorithms for Very Large Data , 2012, IEEE Transactions on Fuzzy Systems.

[33]  Stefanos Zafeiriou,et al.  Euler Principal Component Analysis , 2013, International Journal of Computer Vision.

[34]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[35]  Hong Wang,et al.  Shared-nearest-neighbor-based clustering by fast search and find of density peaks , 2018, Inf. Sci..

[36]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.