SMpeaks: a semi-supervised clustering algorithm based on density peaks

Clustering by fast search and find of Density Peaks (referred to as DP) was introduced by Alex Rodriguez and Alessandro Laio. DP algorithm is based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively large distance from points with higher densities. This algorithm can discover clusters regardless of their shapes and the dimensions of the space containing them. However, it cannot effectively detect clusters with different sizes and densities of arbitrary shapes, especially the same cluster with multiple peaks. Moreover, the DP algorithm needs to select the centers of the clusters by using a decision graph manually. Despite a highly improved performance in semi-supervised clustering, to address this problem, we propose a semi-supervised framework for DP, namely SMpeaks, by integrating pairwise must-link and cannot-link constraints to guide the clustering procedure. We tested the SMpeaks algorithm on complex data sets having clusters with arbitrary shapes, different sizes, and densities. The experimental results have demonstrated that this algorithm is more effective in finding clusters of complex shapes and different densities than DP.

[1]  Yu Xue,et al.  A robust density peaks clustering algorithm using fuzzy neighborhood , 2017, International Journal of Machine Learning and Cybernetics.

[2]  Xueying Zhang,et al.  Robust support vector data description for outlier detection with noise or uncertain data , 2015, Knowl. Based Syst..

[3]  Hui Xiong,et al.  Enhancing semi-supervised clustering: a feature projection perspective , 2007, KDD '07.

[4]  Anil K. Jain Data Clustering: User's Dilemma , 2007, MLDM.

[5]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[6]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[7]  Anil K. Jain,et al.  Data Clustering: A User's Dilemma , 2005, PReMI.

[8]  Marie desJardins,et al.  Constrained Spectral Clustering under a Local Proximity Structure Assumption , 2005, FLAIRS.

[9]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[10]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[11]  T. Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[12]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[13]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[14]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[15]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[16]  Yi Liu,et al.  Clustering Sentences with Density Peaks for Multi-document Summarization , 2015, NAACL.

[17]  Changyin Sun,et al.  K-Means Clustering Based on Density for Scene Image Classification , 2015 .

[18]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[19]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[20]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[21]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .