K-DBSCAN: Identifying Spatial Clusters with Differing Density Levels

Spatial clustering is a very important tool in the analysis of spatial data. In this paper, we propose a novel density based spatial clustering algorithm called K-DBSCAN with the main focus of identifying clusters of points with similar spatial density. This contrasts with many other approaches, whose main focus is spatial contiguity. The strength of K-DBSCAN lies in finding arbitrary shaped clusters in variable density regions. Moreover, it can also discover clusters with overlapping spatial regions, but differing density levels. The goal is to differentiate the most dense regions from lower density regions, with spatial contiguity as the secondary goal. The original DBSCAN fails to discover the clusters with variable density and overlapping regions. OPTICS and Shared Nearest Neighbour (SNN) algorithms have the capabilities of clustering variable density datasets but they have their own limitations. Both fail to detect overlapping clusters. Also, while handling varying density, both of the algorithms merge points from different density levels. K-DBSCAN has two phases: first, it divides all data objects into different density levels to identify the different natural densities present in the dataset, then it extracts the clusters using a modified version of DBSCAN. Experimental results on both synthetic data and a real-world spatial dataset demonstrate the effectiveness of our clustering algorithm.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Lian Duan,et al.  A Local Density Based Spatial Clustering Algorithm with Noise , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[3]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[4]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[7]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[8]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[9]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[10]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[11]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[12]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[13]  Erica Kolatch,et al.  Clustering Algorithms for Spatial Databases: A Survey , 2001 .

[14]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[15]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[16]  Aidong Zhang,et al.  WaveCluster: a wavelet-based clustering approach for spatial data in very large databases , 2000, The VLDB Journal.

[17]  Anthony K. H. Tung,et al.  Spatial clustering methods in data mining : A survey , 2001 .

[18]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .