Dynamic graph-based label propagation for density peaks clustering

Abstract Clustering is a major approach in data mining and machine learning and has been successful in many real-world applications. Density peaks clustering (DPC) is a recently published method that uses an intuitive to cluster data objects efficiently and effectively. However, DPC and most of its improvements suffer from some shortcomings to be addressed. For instance, this method only considers the global structure of data which leading to missing many clusters. The cut-off distance affects the local density values and is calculated in different ways depending on the size of the datasets, which can influence the quality of clustering. Then, the original label assignment can cause a “chain reaction” , whereby if a wrong label is assigned to a data point, and then there may be many more wrong labels subsequently assigned to the other points. In this paper, a density peaks clustering method called DPC-DLP is proposed. The proposed method employs the idea of k-nearest neighbors to compute the global cut-off parameter and the local density of each point. Moreover, the proposed method uses a graph-based label propagation to assign labels to remaining points and form final clusters. The proposed label propagation can effectively assign true labels to those of data instances which located in border and overlapped regions. The proposed method can be applied to some applications. To make the method practical for image clustering, the local structure is used to achieve low-dimensional space. In addition, proposed method considers label space correlation, to be effective in the gene expression problems. Several experiments are performed to evaluate the performance of the proposed method on both synthetic and real-world datasets. The results demonstrate that in most cases, the proposed method outperformed some state-of-the-art methods.

[1]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[2]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[3]  Peng Liu,et al.  VDBSCAN: Varied Density Based Spatial Clustering of Applications with Noise , 2007, 2007 International Conference on Service Systems and Service Management.

[4]  Yu Xue,et al.  A novel density peaks clustering algorithm for mixed data , 2017, Pattern Recognit. Lett..

[5]  Hongjie Jia,et al.  Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..

[6]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[7]  Parham Moradi,et al.  An improved density peaks method for data clustering , 2016, 2016 6th International Conference on Computer and Knowledge Engineering (ICCKE).

[8]  Bo Wang,et al.  Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification , 2013, ICCV.

[9]  Petros Xanthopoulos,et al.  A robust unsupervised consensus control chart pattern recognition framework , 2015, Expert Syst. Appl..

[10]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Shihong Yue,et al.  An unsupervised grid-based approach for clustering analysis , 2010, Science China Information Sciences.

[13]  Guoyin Wang,et al.  DenPEHC: Density peak based efficient hierarchical clustering , 2016, Inf. Sci..

[14]  Hans-Peter Kriegel,et al.  A distribution-based clustering algorithm for mining in large spatial databases , 1998, Proceedings 14th International Conference on Data Engineering.

[15]  Paulo Novais,et al.  A visual analytics framework for cluster analysis of DNA microarray data , 2013, Expert Syst. Appl..

[16]  Ping He,et al.  Manifold Density Peaks Clustering Algorithm , 2015, 2015 Third International Conference on Advanced Cloud and Big Data.

[17]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[18]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Bo Jiang,et al.  Automatic clustering based on density peak detection using generalized extreme value distribution , 2018, Soft Comput..

[20]  Robert Kozma,et al.  Cognitive clustering algorithm for efficient cybersecurity applications , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[21]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[22]  Qingquan Li,et al.  A Novel Ranking-Based Clustering Approach for Hyperspectral Band Selection , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[23]  Rongfang Bie,et al.  Clustering by fast search and find of density peaks via heat diffusion , 2016, Neurocomputing.

[24]  Luís C. Lamb,et al.  A cluster-DEE-based strategy to empower protein design , 2013, Expert Syst. Appl..

[25]  Ken McGarry,et al.  Discovery of functional protein groups by clustering community links and integration of ontological knowledge , 2013, Expert Syst. Appl..

[26]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[27]  Qinbao Song,et al.  Automatic Clustering via Outward Statistical Testing on Density Metrics , 2016, IEEE Transactions on Knowledge and Data Engineering.

[28]  Weixin Xie,et al.  Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[29]  Zhuowen Tu,et al.  Improving Shape Retrieval by Learning Graph Transduction , 2008, ECCV.

[30]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[31]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[32]  Fionn Murtagh,et al.  Algorithms for hierarchical clustering: an overview , 2012, WIREs Data Mining Knowl. Discov..

[33]  Ge Yu,et al.  Efficient Distributed Density Peaks for Clustering Large Data Sets in MapReduce , 2016, IEEE Trans. Knowl. Data Eng..

[34]  Robert Pless,et al.  Manifold clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[35]  Yu Xue,et al.  A robust density peaks clustering algorithm using fuzzy neighborhood , 2017, International Journal of Machine Learning and Cybernetics.

[36]  Eamonn J. Keogh,et al.  Manifold Clustering of Shapes , 2006, Sixth International Conference on Data Mining (ICDM'06).

[37]  Shih-Fu Chang,et al.  Graph construction and b-matching for semi-supervised learning , 2009, ICML '09.

[38]  Pei Chen,et al.  Delta-density based clustering with a divide-and-conquer strategy: 3DC clustering , 2016, Pattern Recognit. Lett..

[39]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[40]  Yunchuan Sun,et al.  Adaptive fuzzy clustering by fast search and find of density peaks , 2015, 2015 International Conference on Identification, Information, and Knowledge in the Internet of Things (IIKI).

[41]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[42]  Volker Tresp,et al.  Soft Clustering on Graphs , 2005, NIPS.

[43]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[44]  Zhengming Ma,et al.  Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy , 2017, Knowl. Based Syst..

[45]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[46]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[47]  Anand Singh Jalal,et al.  A Density Based Algorithm for Discovering Density Varied Clusters in Large Spatial Databases , 2010 .