论文信息 - Class compactness for data clustering

Class compactness for data clustering

In this paper we introduce a compactness based clustering algorithm. The compactness of a data class is measured by comparing the inter-subset and intra-subset distances. The class compactness of a subset is defined as the ratio of the two distances. A subset is called an isolated cluster (or icluster) if its class compactness is greater than 1. All iclusters make a containment tree. We introduce monotonic sequences of iclusters to simplify the structure of the icluster tree, based on which a clustering algorithm is designed. The algorithm has the following advantages: it is effective on data sets with clusters nonlinearly separated, of arbitrary shapes, or of different densities. The effectiveness of the algorithm is demonstrated by experiments.

Yuqing Song | Yuqing Song

[1] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[2] Charles T. Zahn,et al. Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[3] J. H. Ward. Hierarchical Grouping to Optimize an Objective Function , 1963 .

[4] Tian Zhang,et al. BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[5] Yong Shi,et al. A shrinking-based clustering approach for multidimensional data , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[7] Lalit Gupta,et al. A discrepancy measure for improved clustering , 1995, Pattern Recognit..

[8] Kathryn Fraughnaugh,et al. Introduction to graph theory , 1973, Mathematical Gazette.

[9] Vipin Kumar,et al. Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.