Merging DBSCAN and Density Peak for Robust Clustering

In data clustering, density based algorithms are well known for the ability of detecting clusters of arbitrary shapes. DBSCAN is a widely used density based clustering approach, and the recently proposed density peak algorithm has shown significant potential in experiments. However, the DBSCAN algorithm may misclassify border data points of small density as noises and does not work well with large density variance across clusters, and the density peak algorithm has a large dependence on the detected cluster centers. To circumvent these problems, we make a study of these two algorithms and find that they have some complementary properties. We then propose to combine these two algorithms to overcome their problems. Specifically, we use the DP algorithm to detect cluster centers and then determine the parameters for DBSCAN adaptively. After DBSCAN clustering, we further use the DP algorithm to include border data points of small density into clusters. By combining the complementary properties of these two algorithms, we manage to relieve the problems of DBSCAN and avoid the drawbacks of the density peak algorithm in the meanwhile. Our algorithm is tested with synthetic and real datasets, and is demonstrated to perform better than DBSCAN and density peak algorithms, as well as some other clustering algorithms.

[1]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Jian Yu,et al.  On convergence and parameter selection of the EM and DA-EM algorithms for Gaussian mixtures , 2018, Pattern Recognit..

[3]  Ling Li,et al.  Affinity learning via a diffusion process for subspace clustering , 2018, Pattern Recognit..

[4]  Yike Guo,et al.  Fast density clustering strategies based on the k-means algorithm , 2017, Pattern Recognit..

[5]  René Vidal,et al.  Structured Sparse Subspace Clustering: A Joint Affinity Learning and Subspace Clustering Framework , 2016, IEEE Transactions on Image Processing.

[6]  Tao Chen,et al.  Gene expression changes induced by the tumorigenic pyrrolizidine alkaloid riddelliine in liver of Big Blue rats , 2007, BMC Bioinformatics.

[7]  D. Massart,et al.  Looking for natural patterns in data: Part 1. Density-based approach , 2001 .

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[10]  Marcello Pelillo,et al.  A Game-Theoretic Approach to Word Sense Disambiguation , 2016, CL.

[11]  Marcello Pelillo,et al.  Detecting conversational groups in images and sequences: A robust game-theoretic approach , 2016, Comput. Vis. Image Underst..

[12]  Marcello Pelillo,et al.  Dominant Sets and Pairwise Clustering , 2007 .

[13]  Cor J. Veenman,et al.  A Maximum Variance Cluster Algorithm , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[15]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[16]  Pasi Fränti,et al.  Minimum spanning tree based split-and-merge: A hierarchical clustering method , 2011, Inf. Sci..

[17]  Alessio Ferone,et al.  Integrating rough set principles in the graded possibilistic clustering , 2019, Inf. Sci..

[18]  Jian Hou,et al.  Parameter independent clustering based on dominant sets and cluster merging , 2017, Inf. Sci..

[19]  Teng Qiu,et al.  D-NND: A Hierarchical Density Clustering Method via Nearest Neighbor Descent , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[20]  A. Rama Mohan Reddy,et al.  A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method , 2016, Pattern Recognit..

[21]  Chang-Dong Wang,et al.  Discriminative metric learning for multi-view graph partitioning , 2018, Pattern Recognit..

[22]  Jian Hou,et al.  A Parameter-Independent Clustering Framework , 2017, IEEE Transactions on Industrial Informatics.

[23]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[25]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[26]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[27]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Marcello Pelillo,et al.  Interactive Image Segmentation Using Constrained Dominant Sets , 2016, ECCV.

[29]  Shaogang Gong,et al.  Constructing Robust Affinity Graphs for Spectral Clustering , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  He Zhang,et al.  Game theoretic hypergraph matching for multi-source image correspondences , 2017, Pattern Recognit. Lett..

[31]  Xuelong Li,et al.  Multi-view Subspace Clustering , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[33]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[34]  Hong Wang,et al.  Shared-nearest-neighbor-based clustering by fast search and find of density peaks , 2018, Inf. Sci..

[35]  Cheng Wang,et al.  A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data , 2018, Pattern Recognit..

[36]  Jian Hou,et al.  Feature Combination via Clustering , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Elke Achtert,et al.  DeLi-Clu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering by a Closest Pair Ranking , 2006, PAKDD.

[38]  Yuhan Liu,et al.  Clustering based on grid and local density with priority-based expansion for multi-density data , 2018, Inf. Sci..

[39]  Xuelong Li,et al.  DSets-DBSCAN: A Parameter-Free Clustering Algorithm , 2016, IEEE Transactions on Image Processing.

[40]  Jian Hou,et al.  Clustering Based on Dominant Set and Cluster Expansion , 2017, PAKDD.

[41]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[42]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[43]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.