A novel density peak based semi-supervised clustering algorithm

With the rapid development of technology, acquiring and storing big data from various fields is no longer a problem. Instead, how to utilize the data becomes an important and hot research topic. Clustering is one of the important tasks for big data utility. However, there exists one well-known challenge for the task, i.e. it is difficult to incorporate prior information into the clustering results. In this paper, we proposed a density peak based semi-supervised clustering algorithm, which is able to leverage label information of some seed objects for obtaining a better clustering result. Specifically, we first adopted a density based clustering algorithm to identify density peaks as the possible cluster centers for a dataset, and then proposed a graph-based algorithm to assign each center a class label by utilizing some given seed objects. Finally, we leveraged the label information of seed objects and identified centers to generate must-link and cannot-link constraints for clustering. Extensive experiments have been conducted on various publicly available data sets to verify the effectiveness of the proposed method, and the results showed that the proposed density-peak based semi-supervised algorithm outperforms the existing methods substantially.

[1]  Yang Yu,et al.  Learning with Augmented Class by Exploiting Unlabeled Data , 2014, AAAI.

[2]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[3]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[4]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[5]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[6]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[7]  Marie desJardins,et al.  Constrained Spectral Clustering under a Local Proximity Structure Assumption , 2005, FLAIRS.

[8]  Zhi-Hua Zhou,et al.  CoTrade: Confident Co-Training With Data Editing. , 2011, IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society.

[9]  Ping He,et al.  Semi-supervised clustering via multi-level random walk , 2014, Pattern Recognit..

[10]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[11]  Zhi-Hua Zhou,et al.  Semi-supervised learning using label mean , 2009, ICML '09.

[12]  Daoqiang Zhang,et al.  Semi-supervised clustering with metric learning: An adaptive kernel method , 2010, Pattern Recognit..

[13]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[14]  Qiang Yang,et al.  Semi-Supervised Learning with Very Few Labeled Training Examples , 2007, AAAI.