DGCL: An Efficient Density and Grid Based Clustering Algorithm for Large Spatial Database

Spatial clustering, which groups similar objects based on their distance, connectivity, or their relative density in space, is an important component of spatial data mining. Clustering large data sets has always been a serious challenge for clustering algorithms, because huge data set makes the clustering process extremely costly. In this paper, we propose DGCL, an enhanced Density-Grid based Clustering algorithm for Large spatial database. The characteristics of dense area can be enhanced by considering the affection of the surrounding area. Dense areas are analytically identified as clusters by removing sparse area or outliers with the help of a density threshold. Synthetic datasets are used for testing and the result shows the superiority of our approach.

[1]  Mohamed A. Ismail,et al.  An efficient density based clustering algorithm for large databases , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[2]  Tommy W. S. Chow,et al.  A new shifting grid clustering algorithm , 2004, Pattern Recognit..

[3]  Abdol Hamid Pilevar,et al.  GCHL: A grid-clustering algorithm for high-dimensional very large spatial data bases , 2005, Pattern Recognit. Lett..

[4]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[5]  Yu Qian,et al.  GraphZip: a fast and automatic compression method for spatial data clustering , 2004, SAC '04.

[6]  Hans-Peter Kriegel,et al.  Clustering and knowledge discovery in spatial databases , 1997 .

[7]  Kang Zhang,et al.  FAÇADE: a fast and effective approach to the discovery of dense clusters in noisy spatial data , 2004, SIGMOD '04.

[8]  Song Junde,et al.  GDILC: a grid-based density-isoline clustering algorithm , 2001, 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No.01EX479).

[9]  Chengqi Zhang,et al.  Clustering High-Dimensional Data with Low-Order Neighbors , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).