A size-insensitive integrity-based fuzzy c-means method for data clustering

Fuzzy c-means (FCM) is one of the most popular techniques for data clustering. Since FCM tends to balance the number of data points in each cluster, centers of smaller clusters are forced to drift to larger adjacent clusters. For datasets with unbalanced clusters, the partition results of FCM are usually unsatisfactory. Cluster size insensitive FCM (csiFCM) dealt with ''cluster-size sensitivity'' problem by dynamically adjusting the condition value for the membership of each data point based on cluster size after the defuzzification step in each iterative cycle. However, the performance of csiFCM is sensitive to both the initial positions of cluster centers and the ''distance'' between adjacent clusters. In this paper, we present a cluster size insensitive integrity-based FCM method called siibFCM to improve the deficiency of csiFCM. The siibFCM method can determine the membership contribution of every data point to each individual cluster by considering cluster's integrity, which is a combination of compactness and purity. ''Compactness'' represents the distribution of data points within a cluster while ''purity'' represents how far a cluster is away from its adjacent cluster. We tested our siibFCM method and compared with the traditional FCM and csiFCM methods extensively by using artificially generated datasets with different shapes and data distributions, synthetic images, real images, and Escherichia coli dataset. Experimental results showed that the performance of siibFCM is superior to both traditional FCM and csiFCM in terms of the tolerance for ''distance'' between adjacent clusters and the flexibility of selecting initial cluster centers when dealing with datasets with unbalanced clusters.

[1]  Daoqiang Zhang,et al.  Semi-supervised clustering with metric learning: An adaptive kernel method , 2010, Pattern Recognit..

[2]  Sergios Theodoridis,et al.  Pattern Recognition, Fourth Edition , 2008 .

[3]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[4]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[5]  Daoqiang Zhang,et al.  Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation , 2007, Pattern Recognit..

[6]  Daoqiang Zhang,et al.  Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  Miin-Shen Yang,et al.  Alternative c-means clustering algorithms , 2002, Pattern Recognit..

[9]  J. C. Noordam,et al.  A new procedure for the modelling and representation of classes in multivariate images , 2005 .

[10]  Sukumar Nandi,et al.  A distance based clustering method for arbitrary shaped clusters in large datasets , 2011, Pattern Recognit..

[11]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .

[12]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[13]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[14]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[15]  Deli Zhao,et al.  Agglomerative clustering via maximum incremental path integral , 2013, Pattern Recognit..

[16]  Du-Ming Tsai,et al.  Fuzzy C-means based clustering for linearly and nonlinearly separable data , 2011, Pattern Recognit..

[17]  James C. Bezdek,et al.  Partially supervised clustering for image segmentation , 1996, Pattern Recognit..

[18]  Ali Husseinzadeh Kashan,et al.  An efficient approach for unsupervised fuzzy clustering based on grouping evolution strategies , 2013, Pattern Recognit..

[19]  Xinbo Gao,et al.  A novel fuzzy clustering algorithm with non local adaptive spatial constraint for image segmentation , 2011, Signal Process..

[20]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[22]  Tzong-Jer Chen,et al.  Fuzzy c-means clustering with spatial information for image segmentation , 2006, Comput. Medical Imaging Graph..

[23]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[24]  G H Ball,et al.  A clustering technique for summarizing multivariate data. , 1967, Behavioral science.

[25]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[26]  Iñaki Albisua,et al.  SEP/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index , 2010, Pattern Recognit..

[27]  Abraham Kandel,et al.  Feature-based fuzzy classification for interpretation of mammograms , 2000, Fuzzy Sets Syst..

[28]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[29]  Witold Pedrycz,et al.  Conditional Fuzzy C-Means , 1996, Pattern Recognit. Lett..

[30]  Miin Shen Yang,et al.  Segmentation techniques for tissue differentiation in MRI of ophthalmology using fuzzy clustering algorithms. , 2002, Magnetic resonance imaging.

[31]  Maoguo Gong,et al.  Fuzzy C-Means Clustering With Local Information and Kernel Metric for Image Segmentation , 2013, IEEE Transactions on Image Processing.

[32]  James C. Bezdek,et al.  Generalized fuzzy c-means clustering strategies using Lp norm distances , 2000, IEEE Trans. Fuzzy Syst..

[33]  J. C. Noordam,et al.  Multivariate image segmentation with cluster size insensitive fuzzy C-means , 2002 .

[34]  Mu-Chun Su,et al.  A novel algorithm for data clustering , 2001, Pattern Recognit..