论文信息 - Adaptive density peak clustering based on dimensional-free and reverse k-nearest neighbors

Adaptive density peak clustering based on dimensional-free and reverse k-nearest neighbors

Cluster analysis plays a crucial component in consumer behavior segment. The density peak clustering algorithm (DPC) is a novel density-based clustering method. However, it performs poorly in high-dimension datasets and the local density for boundary points. In addition, its fault tolerance is affected by one-step allocation strategy. To overcome these disadvantages, an adaptive density peak clustering algorithm based on dimensional-free and reverse k-nearest neighbors (ERK-DPC) is proposed in this paper. First, we compute Euler cosine distance to obtain the similarity of sample points in high-dimension datasets. Then, the adaptive local density formula is used to measure the local density of each point. Finally, the reverse k-nearest neighbor idea is added on two-step allocation strategy, which assigns the remaining points accurately and effectively. The proposed clustering algorithm is experiments on several benchmark datasets and real-world datasets. By comparing the benchmarks, the results demonstrate that the ERK-DPC algorithm superior to some state-of- the-art methods.

[1] Hans-Peter Kriegel,et al. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[2] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[3] James Bailey,et al. Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[4] Sean Hughes,et al. Clustering by Fast Search and Find of Density Peaks , 2016 .

[5] Xiao Xu,et al. Density peaks clustering using geodesic distances , 2017, International Journal of Machine Learning and Cybernetics.

[6] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[7] Delbert Dueck,et al. Clustering by Passing Messages Between Data Points , 2007, Science.

[8] Yifan Xu,et al. Fast clustering using adaptive density peak detection , 2015, Statistical methods in medical research.

[9] Peter Rossmanith,et al. Exact algorithms for problems related to the densest k-set problem , 2014, Inf. Process. Lett..

[10] Yunchuan Sun,et al. Adaptive fuzzy clustering by fast search and find of density peaks , 2015, 2015 International Conference on Identification, Information, and Knowledge in the Internet of Things (IIKI).

[11] Anil K. Jain. Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[12] Jun Li,et al. An enhanced density peak-based clustering approach for hyperspectral band selection , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[13] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[14] Paul D. McNicholas,et al. Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures , 2016, Comput. Stat. Data Anal..

[15] Aristides Gionis,et al. Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16] Hongjie Jia,et al. Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..

[17] SunYunchuan,et al. Adaptive fuzzy clustering by fast search and find of density peaks , 2016 .

[18] Xueying Zhang,et al. Robust support vector data description for outlier detection with noise or uncertain data , 2015, Knowl. Based Syst..

[19] Alain Bretto,et al. A reductive approach to hypergraph clustering: An application to image segmentation , 2012, Pattern Recognit..

[20] Peilin Yang,et al. An overlapping community detection algorithm based on density peaks , 2017, Neurocomputing.

[21] Yi Liu,et al. Clustering Sentences with Density Peaks for Multi-document Summarization , 2015, NAACL.

[22] Chao Deng,et al. GRIDEN: An effective grid-based and density-based spatial clustering algorithm to support parallel computing , 2018, Pattern Recognit. Lett..

[23] Shashi Shekhar,et al. Clustering and Information Retrieval , 2011, Network Theory and Applications.

[24] Derya Birant,et al. ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[25] Petros Xanthopoulos,et al. A robust unsupervised consensus control chart pattern recognition framework , 2015, Expert Syst. Appl..

[26] Jun Liang,et al. Constraint-based clustering by fast search and find of density peaks , 2019, Neurocomputing.

[27] Qinbao Song,et al. Automatic Clustering via Outward Statistical Testing on Density Metrics , 2016, IEEE Transactions on Knowledge and Data Engineering.

[28] Weixin Xie,et al. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[29] Parham Moradi,et al. Dynamic graph-based label propagation for density peaks clustering , 2019, Expert Syst. Appl..

[30] Vincent S. Tseng,et al. A novel two-level clustering method for time series data analysis , 2010, Expert Syst. Appl..

[31] Pasi Fränti,et al. K-means properties on six clustering benchmark datasets , 2018, Applied Intelligence.

[32] Marimuthu Palaniswami,et al. Fuzzy c-Means Algorithms for Very Large Data , 2012, IEEE Transactions on Fuzzy Systems.

[33] Stefanos Zafeiriou,et al. Euler Principal Component Analysis , 2013, International Journal of Computer Vision.

[34] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[35] Hong Wang,et al. Shared-nearest-neighbor-based clustering by fast search and find of density peaks , 2018, Inf. Sci..

[36] Hans-Peter Kriegel,et al. OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.