论文信息 - A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLARANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

[1] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[2] Hans-Peter Kriegel,et al. The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[3] Peter J. Rousseeuw,et al. Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[4] Michael Stonebraker,et al. The SEQUOIA 2000 storage benchmark , 1993, SIGMOD '93.

[5] Philip K. Chan,et al. Systems for Knowledge Discovery in Databases , 1993, IEEE Trans. Knowl. Data Eng..

[6] Ralf Hartmut Güting,et al. An introduction to spatial database systems , 1994, VLDB J..

[7] Jiawei Han,et al. Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[8] Hans-Peter Kriegel,et al. Multi-step processing of spatial joins , 1994, SIGMOD '94.

[9] Joaquín Fernández-Valdivia,et al. A dynamic approach for clustering data , 1995, Signal Process..

[10] Hans-Peter Kriegel,et al. A Database Interface for Clustering in Large Spatial Databases , 1995, KDD.