FDBSCAN: A Fast DBSCAN Algorithm

Clustering is an important application area for many fields including data mining, statistical data analysis, pattern recognition, image processing, and other business applications. Up to now, many algorithms for clustering have been developed. Contributed from the database research community, DBSCAN algorithm is an outstanding representative of clustering algorithms for its good performance in clustering spatial data. Relying on a density based notion of clusters, DBSCAN is designed to discover clusters of arbitrary shape. It requires only one input parameter and supports the user in determining an appropriate value of it. In this paper, a fast DBSCAN algorithm (FDBSCAN) is developed which considerably speeds up the original DBSCAN algorithm. Unlike DBSCAN, FDBSCAN uses only a small number of representative points in a core point's neighborhood as seeds to expand the cluster such that the execution frequency of region query and consequently the I/O cost are reduced. Experimental results show that FDBSCAN is effective and efficient in clustering large scale databases, and it is faster than the original DBSCAN algorithm by several times.