GDPC: generalized density peaks clustering algorithm based on order similarity

Clustering is a fundamental approach to discover the valuable information in data mining and machine learning. Density peaks clustering is a typical density based clustering and has received increasing attention in recent years. However DPC and most of its improvements still suffer from some drawbacks. For example, it is difficult to find peaks in the sparse cluster regions; assignment for the remaining points tends to cause Domino effect, especially for complicated data. To address the above two problems, we propose generalized density peaks clustering algorithm (GDPC) based on a new order similarity, which is calculated by the order rank of Euclidean distance between two samples. The order similarity can help us to find peaks in the sparse regions. In addition, a two-step assignment is used to weaken Domino effect. In general, GDPC can not only discover clusters in datasets regardless of different sizes, dimensions and shapes, but also address the above two issues. Several experiments on datasets, including Lung, COIL20, ORL, USPS, Mnist, breast and Vote, show that our algorithm is effective in most cases.

[1]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[2]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[3]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[4]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Hong Wang,et al.  Shared-nearest-neighbor-based clustering by fast search and find of density peaks , 2018, Inf. Sci..

[6]  Rongfang Bie,et al.  Clustering by fast search and find of density peaks via heat diffusion , 2016, Neurocomputing.

[7]  Xiao Xu,et al.  Density peaks clustering using geodesic distances , 2017, International Journal of Machine Learning and Cybernetics.

[8]  Bo Jiang,et al.  Automatic clustering based on density peak detection using generalized extreme value distribution , 2017, Soft Computing.

[9]  Bin Xie,et al.  A novel approach for ranking in interval-valued information systems , 2016, J. Intell. Fuzzy Syst..

[10]  X. Qin,et al.  Local gap density for clustering high-dimensional data with varying densities , 2019, Knowl. Based Syst..

[11]  William Zhu,et al.  Relationship among basic concepts in covering-based rough sets , 2009, Inf. Sci..

[12]  William Zhu,et al.  A New Local Density for Density Peak Clustering , 2018, PAKDD.

[13]  Parham Moradi,et al.  Dynamic graph-based label propagation for density peaks clustering , 2019, Expert Syst. Appl..

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Keqin Li,et al.  A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process , 2019 .

[16]  Wei-Ying Ma,et al.  Locality preserving clustering for image database , 2004, MULTIMEDIA '04.

[17]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[18]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[19]  Deli Zhao,et al.  Graph Degree Linkage: Agglomerative Clustering on a Directed Graph , 2012, ECCV.

[20]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[21]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[22]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[23]  Xizhao Wang,et al.  Local similarity and diversity preserving discriminant projection for face and handwriting digits recognition , 2012, Neurocomputing.

[24]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[25]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[26]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Wei-Zhi Wu,et al.  On the belief structures and reductions of multigranulation spaces with decisions , 2017, Int. J. Approx. Reason..

[28]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Weixin Xie,et al.  Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[30]  M. Ankerst,et al.  OPTICS: ordering points to identify the clustering structure , 1999, ACM SIGMOD Conference.

[31]  Hongjie Jia,et al.  Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..

[32]  Zhengming Ma,et al.  Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy , 2017, Knowl. Based Syst..

[33]  William Zhu,et al.  Relationship between generalized rough sets based on binary relation and covering , 2009, Inf. Sci..

[34]  Yongchuan Tang,et al.  Comparative density peaks clustering , 2018, Expert Syst. Appl..

[35]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..