Comparative density peaks clustering

Abstract Clustering analysis is one of the major topics in unsupervised machine learning. A recent study proposes a novel density-based clustering algorithm called the Density Peaks. It is based on two intuitive assumptions: that cluster centers have a higher density than those of their neighbors, and that they also have a relatively large distance from other points with a higher density. To see whether a distance is relatively large, we should make a comparison of it and another one. However, such comparison is not explicitly modeled in the algorithm. Therefore, we propose the Comparative Density Peaks algorithm which takes the comparison into the design of the method. Furthermore, we give our analysis of Density Peaks from the perspective of the tree structure, and summarize two sufficient conditions that contribute to a good clustering performance under the Density Peaks framework. Extensive experiments show that our proposed algorithm significantly outperforms the original Density Peaks clustering algorithm.

[1]  Kenneth H. Rosen,et al.  Discrete Mathematics and its applications , 2000 .

[2]  Guoyin Wang,et al.  DenPEHC: Density peak based efficient hierarchical clustering , 2016, Inf. Sci..

[3]  Guoyin Wang,et al.  Fat node leading tree for data stream clustering with density peaks , 2017, Knowl. Based Syst..

[4]  Yi Liu,et al.  Clustering Sentences with Density Peaks for Multi-document Summarization , 2015, NAACL.

[5]  D. Ayres-de- Campos,et al.  SisPorto 2.0: a program for automated analysis of cardiotocograms. , 2000, The Journal of maternal-fetal medicine.

[6]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[7]  Hongjie Jia,et al.  Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..

[8]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[9]  Ganapati Panda,et al.  Design of computationally efficient density-based clustering algorithms , 2015, Data Knowl. Eng..

[10]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[11]  Peilin Yang,et al.  An overlapping community detection algorithm based on density peaks , 2017, Neurocomputing.

[12]  Qingquan Li,et al.  A Novel Ranking-Based Clustering Approach for Hyperspectral Band Selection , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[13]  Weixin Xie,et al.  Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[14]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[15]  Xi Chen,et al.  Hyperspectral data clustering based on density analysis ensemble , 2017 .

[16]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression (PIE) database , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[17]  D. Massart,et al.  Looking for natural patterns in data: Part 1. Density-based approach , 2001 .

[18]  Arthur Zimek,et al.  A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies , 2013, Data Mining and Knowledge Discovery.

[19]  Woong-Kee Loh,et al.  Fast density-based clustering through dataset partition using graphics processing units , 2015, Inf. Sci..

[20]  Rómer Rosales,et al.  Comparing Clustering with Pairwise and Relative Constraints , 2016, ACM Trans. Knowl. Discov. Data.

[21]  Ryutaro Tateishi,et al.  Using geographically weighted variables for image classification , 2012 .

[22]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[23]  Jincai Huang,et al.  Community detection in hypernetwork via Density-Ordered Tree partition , 2016, Appl. Math. Comput..

[24]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Bo Wang,et al.  Effectively clustering by finding density backbone based-on kNN , 2016, Pattern Recognit..

[26]  Peter W. Tse,et al.  An intelligent and improved density and distance-based clustering approach for industrial survey data classification , 2017, Expert Syst. Appl..

[27]  Kang Sun,et al.  Exemplar Component Analysis: A Fast Band Selection Method for Hyperspectral Imagery , 2015, IEEE Geoscience and Remote Sensing Letters.

[28]  Bernhard C. Geiger,et al.  Semi-supervised cross-entropy clustering with information bottleneck constraint , 2017, Inf. Sci..

[29]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[30]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[31]  Parag Kulkarni,et al.  Algorithm to determine ε-distance parameter in density based clustering , 2014, Expert Syst. Appl..

[32]  Desire L. Massart,et al.  Looking for Natural Patterns in Analytical Data, 2. Tracing Local Density with OPTICS , 2002, J. Chem. Inf. Comput. Sci..

[33]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[34]  Yu Xue,et al.  A novel density peaks clustering algorithm for mixed data , 2017, Pattern Recognit. Lett..

[35]  Babji Srinivasan,et al.  Fast and accurate lithography simulation using cluster analysis in resist model building , 2015 .

[36]  Piotr A. Kowalski,et al.  Complete Gradient Clustering Algorithm for Features Analysis of X-Ray Images , 2010 .

[37]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[38]  Younghoon Kim,et al.  DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce , 2014, Inf. Syst..

[39]  Chris H. Q. Ding,et al.  Robust nonnegative matrix factorization using L21-norm , 2011, CIKM '11.

[40]  Seref Sagiroglu,et al.  The development of intuitive knowledge classifier and the modeling of domain dependent data , 2013, Knowl. Based Syst..

[41]  Hai Le Vu,et al.  Partitioning road networks using density peak graphs: Efficiency vs. accuracy , 2017, Inf. Syst..

[42]  Charles D. Mallah,et al.  PLANT LEAF CLASSIFICATION USING PROBABILISTIC INTEGRATION OF SHAPE, TEXTURE AND MARGIN FEATURES , 2013 .

[43]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[44]  Pei Chen,et al.  Delta-density based clustering with a divide-and-conquer strategy: 3DC clustering , 2016, Pattern Recognit. Lett..

[45]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[46]  G. Hommel A stagewise rejective multiple test procedure based on a modified Bonferroni test , 1988 .

[47]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[48]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[49]  Jie Zhang,et al.  Finding Communities by Their Centers , 2016, Scientific Reports.

[50]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[51]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[52]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[53]  André R. S. Marçal,et al.  Evaluation of Features for Leaf Discrimination , 2013, ICIAR.

[54]  Ge Yu,et al.  Efficient Distributed Density Peaks for Clustering Large Data Sets in MapReduce , 2016, IEEE Trans. Knowl. Data Eng..

[55]  Alessandro Fiori,et al.  DeCoClu: Density consensus clustering approach for public transport data , 2016, Inf. Sci..

[56]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.