SDenPeak: Semi-supervised Nonlinear Clustering Based on Density and Distance

Clustering by fast search and find of Density Peaks termed DenPeak is the latest and the most popular development of unsupervised clustering that combines both density and distance. However, it suffers from significantly inaccurate performance when there is large diversity of density in different clusters in completely unsupervised. Despite a highly improved performance in semi-supervised clustering, there has been no works to incorporate supervision into DenPeak by using only a few pairwise must-link and cannot-link constraints. To address this problem, we propose a semi-supervised framework for DenPeak, namely SDenPeak, by integrating pairwise constraints to guide the clustering procedure. Experimental results confirm that our algorithm is simple but quite effective in generating satisfactory results on targeting real datasets.

[1]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[2]  Miguel Á. Carreira-Perpiñán,et al.  Constrained spectral clustering through affinity propagation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Javier Béjar Semi-Supervised Clustering , 2009, Encyclopedia of Database Systems.

[4]  Chang-Dong Wang,et al.  Position regularized Support Vector Domain Description , 2013, Pattern Recognit..

[5]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, Machine Learning.

[6]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[7]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[8]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[9]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[10]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Chang-Dong Wang,et al.  Multi-Exemplar Affinity Propagation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[13]  Chang-Dong Wang,et al.  Graph-Based Multiprototype Competitive Learning and Its Applications , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Chang-Dong Wang,et al.  A Conscience On-line Learning Approach for Kernel-Based Clustering , 2010, 2010 IEEE International Conference on Data Mining.

[15]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[16]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[17]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[18]  Peter Meer,et al.  Semi-Supervised Kernel Mean Shift Clustering , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[20]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[21]  M. Cugmas,et al.  On comparing partitions , 2015 .

[22]  Ting Luo,et al.  A multi-prototype clustering algorithm based on minimum spanning tree , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[23]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..