ROBP a robust border-peeling clustering using Cauchy kernel

Abstract Recently a novel density-based clustering algorithm , namely, border-peeling (BP) clustering algorithm, is proposed to group data by iteratively identifying border points and peeling off them until separable areas of data remain. The BP clustering is able to correctly recognize the true structure of clusters and automatically detect the outliers on several test cases. However, there are some drawbacks in BP, and these may hinder its widespread application. The BP clustering might yield bad results on datasets with non-uniformly-distributed clusters. Especially, the BP clustering tends to over-partition the data with complex shape. To overcome these defects, a robust border-peeling clustering algorithm (named as ROBP) is proposed in this paper. Our method improves the BP clustering algorithm from two aspects: density influence (i.e. density estimation) and linkage criterion (i.e. association strategy). In density estimation, we use Cauchy kernel with longer tails instead of Gaussian kernel in the local scaling function, and further propose a kernel density estimator , i.e., the density estimator based on Cauchy kernel. It can calculate quickly and accurately the density influence value of each point. In association strategy, we design a linkage criterion based on the shared neighborhood information. The linkage criterion can create some links between peeled border points and their neighboring peeled border points, in order to avoid over-segmentation of the clusters. We integrate the proposed linkage criterion and the uni-directional association strategy, and further propose a bi-directional association strategy. In experiments, we compare ROBP with 7 representative density-based clustering (or hierarchical clustering) algorithms, including BP, DBSCAN, HDBSCAN, density peak (DP) clustering, DPC-KNN, DPC-DBFN and McDPC, on 8 synthetic datasets and 11 real-world datasets. Results show that the proposed algorithm outperforms 7 competitors in most cases. Moreover, we compare the robustness of ROBP and BP, and evaluate their running time. Experimental results indicate that ROBP is much more robust and reliable, as well as it is competitive to BP in computational efficiency.

[1]  Hong Wang,et al.  Shared-nearest-neighbor-based clustering by fast search and find of density peaks , 2018, Inf. Sci..

[2]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[3]  Bo Jiang,et al.  Automatic clustering based on density peak detection using generalized extreme value distribution , 2018, Soft Comput..

[4]  Zhiqiang Geng,et al.  Joint entity and relation extraction model based on rich semantics , 2021, Neurocomputing.

[5]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[6]  Xuelong Li,et al.  Spectral Clustering by Joint Spectral Embedding and Spectral Rotation , 2020, IEEE Transactions on Cybernetics.

[7]  Ronghua Shang,et al.  Non-Negative Spectral Learning and Sparse Regression-Based Dual-Graph Regularized Feature Selection , 2018, IEEE Transactions on Cybernetics.

[8]  Chang-Dong Wang,et al.  A Novel clustering method based on hybrid K-nearest-neighbor graph , 2018, Pattern Recognit..

[9]  Feiping Nie,et al.  K-Multiple-Means: A Multiple-Means Clustering Method with Specified K Clusters , 2019, KDD.

[10]  Parham Moradi,et al.  Density peaks clustering based on density backbone and fuzzy neighborhood , 2020, Pattern Recognit..

[11]  Shuyuan Yang,et al.  Dual-graph regularized non-negative matrix factorization with sparse and orthogonal constraints , 2018, Eng. Appl. Artif. Intell..

[12]  Leandro dos Santos Coelho,et al.  Earthworm optimisation algorithm: a bio-inspired metaheuristic algorithm for global optimisation problems , 2018, Int. J. Bio Inspired Comput..

[13]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[14]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[15]  Amin A. Shoukry,et al.  CMUNE: A clustering using mutual nearest neighbors algorithm , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).

[16]  Gavin C. Cawley,et al.  On a Fast, Compact Approximation of the Exponential Function , 2000, Neural Computation.

[17]  M. Cugmas,et al.  On comparing partitions , 2015 .

[18]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[19]  Ronghua Shang,et al.  Local discriminative based sparse subspace learning for feature selection , 2019, Pattern Recognit..

[20]  Shuyuan Yang,et al.  Global discriminative-based nonnegative spectral clustering , 2016, Pattern Recognit..

[21]  Qin Wei,et al.  A model-free Bayesian classifier , 2019, Inf. Sci..

[22]  Robin Sibson,et al.  The Construction of Hierarchic and Non-Hierarchic Classifications , 1968, Comput. J..

[23]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[24]  Fanzhang Li,et al.  Semi-supervised concept factorization for document clustering , 2016, Inf. Sci..

[25]  Hongjie Jia,et al.  Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..

[26]  Majid Abdolrazzagh-Nezhad,et al.  A new hybridization of DBSCAN and fuzzy earthworm optimization algorithm for data cube clustering , 2020, Soft Computing.

[27]  Zhixin Tie,et al.  Quasi-cluster centers clustering algorithm based on potential entropy and t-distributed stochastic neighbor embedding , 2018, Soft Computing.

[28]  Weixin Xie,et al.  Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[29]  Caiming Zhang,et al.  Improved clustering algorithms for image segmentation based on non-local information and back projection , 2020, Inf. Sci..

[30]  Martin Ester,et al.  Density‐based clustering , 2019, WIREs Data Mining Knowl. Discov..

[31]  Yongming Han,et al.  An asymmetric knowledge representation learning in manifold space , 2020, Inf. Sci..

[32]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[33]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[34]  P. Sneath The application of computers to taxonomy. , 1957, Journal of general microbiology.

[35]  Rong Zheng,et al.  RECOME: a New Density-Based Clustering Algorithm Using Relative KNN Kernel Density , 2016, Inf. Sci..

[36]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[37]  Yongming Han,et al.  Semantic relation extraction using sequential and tree-structured LSTM with attention , 2020, Inf. Sci..

[38]  Shan-shan Li,et al.  An Improved DBSCAN Algorithm Based on the Neighbor Similarity and Fast Nearest Neighbor Query , 2020, IEEE Access.

[39]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[40]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[41]  Hui Xiong,et al.  Understanding and Enhancement of Internal Clustering Validation Measures , 2013, IEEE Transactions on Cybernetics.

[42]  Xiao Xu,et al.  An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood , 2017, Knowl. Based Syst..

[43]  Yongming Han,et al.  Level set based shape prior and deep learning for image segmentation , 2020, IET Image Process..

[44]  Y. Mukaigawa,et al.  Large Deviations Estimates for Some Non-local Equations I. Fast Decaying Kernels and Explicit Bounds , 2022 .

[45]  Daniel Cohen-Or,et al.  Border-Peeling Clustering , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Qingsheng Zhu,et al.  QCC: a novel clustering algorithm based on Quasi-Cluster Centers , 2017, Machine Learning.

[47]  Zheng Hong,et al.  Design and Implementation of an Improved DBSCAN Algorithm , 2019, 2019 IEEE 3rd Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC).

[48]  W. T. Williams,et al.  Multivariate Methods in Plant Ecology: V. Similarity Analyses and Information-Analysis , 1966 .

[49]  Xiaofeng Zhang,et al.  McDPC: multi-center density peak clustering , 2020, Neural Computing and Applications.

[50]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .