PFHC: A clustering algorithm based on data partitioning for unevenly distributed datasets

Recently many researchers exert their effort on clustering as a primary data mining method for knowledge discovery, but only few of them have focused on uneven dataset. In the last research, we proposed an efficient hierarchical algorithm based on fuzzy graph connectedness-FHC-to discover clusters with arbitrary shapes. In this paper, we present a novel clustering algorithm for uneven dataset-PFHC-which is an extended version based on FHC. In PFHC, dataset is divided into several local spaces firstly according to the data density of distribution, where the data density in any local space is nearly uniform. In order to achieve the goal, local @? and @l are used in each local domain to acquire local clustering result by FHC. Then boundary between local areas needs being taken into consideration for combination. Finally local clusters need to be merged to obtain global clusters. As an extension of FHC, PFHC can deal with uneven datasets more effectively and efficiently, and generate better quality clusters than other methods as experiment shows. Furthermore, PFHC is found to be able to process incremental data as well in this work.

[1]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[2]  Thomas A. Runkler,et al.  Alternating cluster estimation: a new tool for clustering and function approximation , 1999, IEEE Trans. Fuzzy Syst..

[3]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[4]  Jacek M. Leski Generalized weighted conditional fuzzy clustering , 2003, IEEE Trans. Fuzzy Syst..

[5]  Francisco de A. T. de Carvalho,et al.  Partitional fuzzy clustering methods based on adaptive quadratic distances , 2006, Fuzzy Sets Syst..

[6]  Xiaoying Tai,et al.  A hierarchical clustering algorithm based on fuzzy graph connectedness , 2006, Fuzzy Sets Syst..

[7]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  Witold Pedrycz,et al.  P-FCM: a proximity -- based fuzzy clustering , 2004, Fuzzy Sets Syst..

[10]  Efendi N. Nasibov,et al.  A new unsupervised approach for fuzzy clustering , 2007, Fuzzy Sets Syst..

[11]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[12]  Frank Nielsen,et al.  On weighting clustering , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[14]  W. T. Tucker,et al.  Convergence theory for fuzzy c-means: Counterexamples and repairs , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[16]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[18]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[19]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[20]  Luigi Cinque,et al.  A clustering fuzzy approach for image segmentation , 2004, Pattern Recognit..