Semi-supervised DenPeak Clustering with Pairwise Constraints

Density-based clustering is an important class of approaches to data clustering due to good performance. Among this class of approaches, DenPeak is an effective density-based clustering method that can automatically find the number of clusters and find arbitrary-shape clusters in relative easy scenarios. However, in many situations, it is usually hard for DenPeak to find an appropriate number of clusters without supervision or prior knowledge. In addition, DenPeak often fails to find local structures of each cluster since it assigns only one center to each cluster. To address these problems, we introduce a novel semi-supervised DenPeak clustering (SSDC) method by introducing pairwise constraints or side information to guide the cluster process. These pairwise constraints or side information improve the clustering performance by explicitly indicating the affiliated cluster of data samples in each pair. Concretely, SSDC firstly generates a relatively large number of temporary clusters, and then merges them with the assistance of samples’ pairwise constraints and temporary clusters’ adjacent information. The proposed SSDC can significantly improve the performance of DenPeak. Its superiority to state-of-the-art clustering methods has been empirically demonstrated on both artificial and real data sets.

[1]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[2]  Myra Spiliopoulou,et al.  C-DBSCAN: Density-Based Clustering with Constraints , 2009, RSFDGrC.

[3]  Uday Kamath,et al.  Boosted Mean Shift Clustering , 2014, ECML/PKDD.

[4]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[5]  Tinghuai Ma,et al.  An efficient and scalable density-based clustering algorithm for datasets with complex structures , 2016, Neurocomputing.

[6]  Arthur Zimek,et al.  A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies , 2013, Data Mining and Knowledge Discovery.

[7]  Zenglin Xu,et al.  Robust graph regularized nonnegative matrix factorization for clustering , 2017, Data Mining and Knowledge Discovery.

[8]  Guoji Zhang,et al.  Random subspace based semi-supervised feature selection , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[9]  Christian Böhm,et al.  Anytime density-based clustering of complex data , 2014, Knowledge and Information Systems.

[10]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[11]  Carlotta Domeniconi,et al.  Weighted-Object Ensemble Clustering , 2013, 2013 IEEE 13th International Conference on Data Mining.

[12]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[15]  Clara Pizzuti,et al.  DESCRY: A Density Based Clustering Algorithm for Very Large Data Sets , 2004, IDEAL.

[16]  Ayhan Demiriz,et al.  Constrained K-Means Clustering , 2000 .

[17]  Qingyun Du,et al.  Density-Based Clustering with Geographical Background Constraints Using a Semantic Expression Model , 2016, ISPRS Int. J. Geo Inf..

[18]  Onisimo Mutanga,et al.  Determining extreme heat vulnerability of Harare Metropolitan City using multispectral remote sensing and socio-economic data , 2018 .

[19]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[20]  Carlotta Domeniconi,et al.  Weighted-object ensemble clustering: methods and analysis , 2016, Knowledge and Information Systems.

[21]  Myra Spiliopoulou,et al.  Density-based semi-supervised clustering , 2010, Data Mining and Knowledge Discovery.

[22]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[23]  Xia Chen,et al.  Semi-supervised Multi-label Linear Discriminant Analysis , 2017, ICONIP.

[24]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[25]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[26]  Zenglin Xu,et al.  Robust multi-view data clustering with multi-view capped-norm K-means , 2018, Neurocomputing.

[27]  Chang-Dong Wang,et al.  SDenPeak: Semi-supervised Nonlinear Clustering Based on Density and Distance , 2016, 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService).

[28]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[29]  Carlotta Domeniconi,et al.  A Weighted Adaptive Mean Shift Clustering Algorithm , 2014, SDM.

[30]  Zhenhong Du,et al.  A parallel varied density-based clustering algorithm with optimized data partition , 2018 .