Data Set Homeomorphism Transformation Based Meta-clustering

Clustering analysis is an important data mining technique with a variety of applications. In this paper, the data set is treated in a dynamic way and a Data Set Homeomorphism Transformation Based Meta-Clustering algorithm (DSHTBMC) is proposed. DSHTBMC decomposes the task of clustering into multiple stages. It firstly constructs a series of homeomorphous data sets ranging from high regularity to low, and then iteratively clusters each homeomorphism data set based on the clustering result of the preceding homeomorphism data set. Since data sets of high regularities are easier to be clustered, and the clustering result of each homeomorphism data set can be used to induce high quality clusters in the following-up homeomorphism data set, in this way, the hardness of the problem is decreased. Two strategies (i.e., Displacement and Noising) for data set homeomorphism transformation are proposed, with classical hierarchical divisive method---Bisecting k-means as DSHTBMC's subordinate clustering algorithm, two new clustering algorithms---HD-DSHTBMC-D and HD-DSHTBMC-N are obtained. Experimental results indicate that the new clustering algorithms are remarkably better than Bisecting k-means algorithm in terms of clustering quality.

[1]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[2]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[3]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[4]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[5]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[6]  Sergio M. Savaresi,et al.  Choosing the cluster to split in bisecting divisive clustering algorithms , 2006 .

[7]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[8]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[9]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[10]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[11]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[12]  Carlos Ordonez,et al.  FREM: fast and robust EM clustering for large data sets , 2002, CIKM '02.

[13]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[14]  Qian Weining,et al.  Analyzing Popular Clustering Algorithms from Different Viewpoints , 2002 .

[15]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[16]  Sergio M. Savaresi,et al.  On the performance of bisecting K-means and PDDP , 2001, SDM.