Enhancing Cluster Center Identification in Density Peak Clustering

As a clustering approach with significant potential, the density peak (DP) clustering algorithm is shown to be adapted to different types of datasets. This algorithm is developed on the basis of a few simple assumptions. While being simple, this algorithm performs well in many experiments. However, we find that local density is not very informative in identifying cluster centers and may be one reason for the influence of density parameter on clustering results. For the purpose of solving this problem and improving the DP algorithm, we study the cluster center identification process of the DP algorithm and find that what distinguishes cluster centers from non-density-peak data is not the great local density, but the role of density peaks. We then propose to describe the role of density peaks based on the local density of subordinates and present a better alternative to the local density criterion. Experiments show that the new criterion is helpful in isolating cluster centers from the other data. By combining this criterion with a new average distance based density kernel, our algorithm performs better than some other commonly used algorithms in experiments on various datasets.

[1]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[2]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[3]  Elke Achtert,et al.  DeLi-Clu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering by a Closest Pair Ranking , 2006, PAKDD.

[4]  Xuelong Li,et al.  DSets-DBSCAN: A Parameter-Free Clustering Algorithm , 2016, IEEE Transactions on Image Processing.

[5]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[7]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[8]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Marcello Pelillo,et al.  Dominant Sets and Pairwise Clustering , 2007 .

[10]  Cor J. Veenman,et al.  A Maximum Variance Cluster Algorithm , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  D. Massart,et al.  Looking for natural patterns in data: Part 1. Density-based approach , 2001 .

[13]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[16]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[17]  Shaogang Gong,et al.  Constructing Robust Affinity Graphs for Spectral Clustering , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  G. Evanno,et al.  Detecting the number of clusters of individuals using the software structure: a simulation study , 2005, Molecular ecology.

[19]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.