论文信息 - Density-Based Clustering and Anomaly Detection

Density-Based Clustering and Anomaly Detection

As of 1996, when a special issue on density-based clustering was published (DBSCAN) (Ester et al., 1996), existing clustering techniques focused on two categories: partitioning methods, and hierarchical methods. Partitioning clustering attempts to break a data set into K clusters such that the partition optimizes a given criterion. Besides difficulty in choosing the proper parameter K, and incapacity of discovering clusters with arbitrary shape, partitioning clustering techniques are very sensitive to outliers. Although the k-medoids method (Kaufman & Rousseeuw, 1990) is more robust than k-means (MacQueen, 1967) in the presence of outliers, they cannot discover outliers. Hierarchical clustering algorithms produce a nested sequence of clusters, with a single all-inclusive cluster at the top and single point clusters at the bottom. CURE (Guha et al., 1998) is capable of finding clusters of arbitrary shapes and reduces the effect of outliers; however, it only considers cluster proximity yet ignores cluster interconnectivity, and an outlier is still assigned to the cluster which has the closest representative point to it.

Lian Duan | Lian Duan

[1] Hans-Peter Kriegel,et al. LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[2] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3] Hans-Peter Kriegel,et al. OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[4] Lian Duan,et al. A Local Density Based Spatial Clustering Algorithm with Noise , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[5] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[6] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[7] R. Suganya,et al. Data Mining Concepts and Techniques , 2010 .

[8] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[9] Ying Liu,et al. Cluster-based outlier detection , 2009, Ann. Oper. Res..

[10] Peter J. Rousseeuw,et al. Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[11] Daniel A. Keim,et al. An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.