AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

Clustering is a typical data mining technique that partitions a dataset into multiple subsets of similar objects according to similarity metrics. In particular, density-based algorithms can find clusters of different shapes and sizes while remaining robust to noise objects. DBSCAN, a representative density-based algorithm, finds clusters by defining the density criterion with global parameters, $$ \varepsilon $$ε-distance and $$ MinPts $$MinPts. However, most density-based algorithms, including DBSCAN, find clusters incorrectly because the density criterion is fixed to the global parameters and misapplied to clusters of varying densities. Although studies have been conducted to determine optimal parameters or to improve clustering performance using additional parameters and computations, running time for clustering has been significantly increased, particularly when the dataset is large. In this study, we focus on minimizing the additional computation required to determine the parameters by using the approximate adaptive $$ \varepsilon $$ε-distance for each density while finding the clusters with varying densities that DBSCAN cannot find. Specifically, we propose a new tree structure based on a quadtree to define a dataset density layer. In addition, we propose approximate adaptive DBSCAN (AA-DBSCAN) and kAA-DBSCAN that have clustering performance similar to those of existing algorithms for finding clusters with varying densities while significantly reducing the running time required to perform clustering. We evaluate the proposed algorithms, AA-DBSCAN and kAA-DBSCAN, via extensive experiments using the state-of-the-art algorithms. Experimental results demonstrate an improvement in clustering performance and reduction in running time of the proposed algorithms.

[1]  Mohamed A. Ismail,et al.  An efficient density based clustering algorithm for large databases , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[2]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[3]  Qing Yang,et al.  A novel DBSCAN with entropy and probability for mixed data , 2017, Cluster Computing.

[4]  Jian-Huang Lai,et al.  APSCAN: A parameter free algorithm for clustering , 2011, Pattern Recognit. Lett..

[5]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Angelo Dalli Adaptation of the F-measure to Cluster Based Lexicon Quality Evaluation , 2003 .

[7]  Hans-Peter Kriegel,et al.  A Fast Parallel Clustering Algorithm for Large Spatial Databases , 1999, Data Mining and Knowledge Discovery.

[8]  Wookey Lee,et al.  Optimized combinatorial clustering for stochastic processes , 2017, Cluster Computing.

[9]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[10]  Jing Li,et al.  A new hybrid method based on partitioning-based DBSCAN and ant clustering , 2011, Expert Syst. Appl..

[11]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[12]  Baldo Faieta,et al.  Diversity and adaptation in populations of clustering ants , 1994 .

[13]  Tinghuai Ma,et al.  An efficient and scalable density-based clustering algorithm for datasets with complex structures , 2016, Neurocomputing.

[14]  Kai Ming Ting,et al.  Density-ratio based clustering for discovering clusters with varying densities , 2016, Pattern Recognit..

[15]  Jing Cao,et al.  Approaches for scaling DBSCAN algorithm to large spatial databases , 2008, Journal of Computer Science and Technology.

[16]  Matteo Dell'Amico,et al.  NG-DBSCAN: Scalable Density-Based Clustering for Arbitrary Data , 2016, Proc. VLDB Endow..

[17]  Zhengming Ma,et al.  Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy , 2017, Knowl. Based Syst..

[18]  Cao Jing,et al.  Approaches for scaling DBSCAN algorithm to large spatial databases , 2000 .

[19]  Xuelong Li,et al.  DSets-DBSCAN: A Parameter-Free Clustering Algorithm , 2016, IEEE Transactions on Image Processing.

[20]  Rafael Sachetto Oliveira,et al.  G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering , 2013, ICCS.

[21]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[22]  Woong-Kee Loh,et al.  Fast density-based clustering through dataset partition using graphics processing units , 2015, Inf. Sci..

[23]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[24]  Parag Kulkarni,et al.  Algorithm to determine ε-distance parameter in density based clustering , 2014, Expert Syst. Appl..

[25]  Peng Liu,et al.  VDBSCAN: Varied Density Based Spatial Clustering of Applications with Noise , 2007, 2007 International Conference on Service Systems and Service Management.

[26]  Kai Li,et al.  Reckon the Parameter of DBSCAN for Multi-density Data Sets with Constraints , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[27]  Howard J. Hamilton,et al.  DBRS: A Density-Based Spatial Clustering Method with Random Sampling , 2003, PAKDD.

[28]  Somjit Arch-int,et al.  Determination of the appropriate parameters for K‐means clustering using selection of region clusters based on density DBSCAN (SRCD‐DBSCAN) , 2017, Expert Syst. J. Knowl. Eng..

[29]  Swarup Roy,et al.  An Approach to Find Embedded Clusters Using Density Based Techniques , 2005, ICDCIT.

[30]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[31]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[32]  Lian Duan,et al.  A Local Density Based Spatial Clustering Algorithm with Noise , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.