An Ensemble Density-based Clustering Method

Density based clustering is sound for its great ability of finding arbitrary shapes of clusters and identifying the number of clusters automatically. DBSCAN is a frequently used density based clustering algorithm. In DBSCAN a density threshold, which is hard to be chosen adaptively, should be specified to determine whether an object is dense or sparse. In this paper we introduce the concept of clustering ensemble to avoid the difficulty of selecting a single appropriate threshold. Performing DBSCAN multiple times with diverse thresholds picked up from a pre-constructed interval, the final partition can be figured out via a consensus function. Experimental results show that this method can go beyond DBSCAN both in validity and stability, and avoid the inefficiency caused by any inappropriate thresholds.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Ge Lindong,et al.  Adaptive DBSCAN-based algorithm for constellation reconstruction and modulation identification , 2004, 2004 Asia-Pacific Radio Science Conference, 2004. Proceedings..

[3]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[4]  Chin-Chen Chang,et al.  A New Density-Based Scheme for Clustering Based on Genetic Algorithm , 2005, Fundam. Informaticae.

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[7]  Michalis Vazirgiannis,et al.  Clustering validity assessment: finding the optimal partitioning of a data set , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[8]  郭继东,et al.  A statistical information-based clustering approach in distance space , 2005 .

[9]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[10]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[11]  Ana L. N. Fred,et al.  Evidence Accumulation Clustering Based on the K-Means Algorithm , 2002, SSPR/SPR.

[12]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[13]  Mohamed S. Kamel,et al.  Finding Natural Clusters Using Multi-clusterer Combiner Based on Shared Nearest Neighbors , 2003, Multiple Classifier Systems.

[14]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[15]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[16]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.