Interactive clustering and merging with a new fuzzy expected value

Abstract Major problems exist in both crisp and fuzzy clustering algorithms. The fuzzy c-means type of algorithms use weights determined by a power m of inverse distances that remains fixed over all iterations and over all clusters, even though smaller clusters should have a larger m. Our method uses a different “distance” for each cluster that changes over the early iterations to fit the clusters. Comparisons show improved results. We also address other perplexing problems in clustering: (i) find the optimal number K of clusters; (ii) assess the validity of a given clustering; (iii) prevent the selection of seed vectors as initial prototypes from affecting the clustering; (iv) prevent the order of merging from affecting the clustering; and (v) permit the clusters to form more natural shapes rather than forcing them into normed balls of the distance function. We employ a relatively large number K of uniformly randomly distributed seeds and then thin them to leave fewer uniformly distributed seeds. Next, the main loop iterates by assigning the feature vectors and computing new fuzzy prototypes. Our fuzzy merging then merges any clusters that are too close to each other. We use a modified Xie-Bene validity measure as the goodness of clustering measure for multiple values of K in a user-interaction approach where the user selects two parameters (for eliminating clusters and merging clusters after viewing the results thus far). The algorithm is compared with the fuzzy c-means on the iris data and on the Wisconsin breast cancer data.

[1]  Moti Schneider,et al.  On the use of fuzzy sets in histogram equalization , 1992 .

[2]  Dong-Chul Park,et al.  Gradient based fuzzy c-means (GBFCM) algorithm , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[3]  Michael P. Windham,et al.  Cluster Validity for the Fuzzy c-Means Clustering Algorithrm , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Fernando Martin,et al.  Partition validity and defuzzification , 2000, Fuzzy Sets Syst..

[5]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[6]  Arnaud Devillez,et al.  Performance evaluation of fuzzy classification methods designed for real time application , 1999, Int. J. Approx. Reason..

[7]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  P Willett,et al.  Comparison of algorithms for dissimilarity-based compound selection. , 1997, Journal of molecular graphics & modelling.

[9]  Korris Fu-Lai Chung,et al.  Fuzzy competitive learning , 1994, Neural Networks.

[10]  James C. Bezdek,et al.  Sequential Competitive Learning and the Fuzzy c-Means Clustering Algorithms , 1996, Neural Networks.

[11]  Lawrence O. Hall,et al.  An investigation of mountain method clustering for large data sets , 1997, Pattern Recognit..

[12]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[13]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[14]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Stephen L. Chiu,et al.  Fuzzy Model Identification Based on Cluster Estimation , 1994, J. Intell. Fuzzy Syst..

[16]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[17]  Mu-Song Chen,et al.  Fuzzy clustering analysis for optimizing fuzzy membership functions , 1999, Fuzzy Sets Syst..

[18]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[19]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[20]  Pedro Larrañaga,et al.  An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..

[21]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .

[22]  James C. Bezdek,et al.  Fuzzy mathematics in pattern classification , 1973 .

[23]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[24]  Thomas M. Cover,et al.  Estimation by the nearest neighbor rule , 1968, IEEE Trans. Inf. Theory.

[25]  Abraham Kandel,et al.  Most typical values for fuzzy sets , 1997, Fuzzy Sets Syst..

[26]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Chin-Teng Lin,et al.  Neural fuzzy systems , 1994 .

[28]  R. Yager,et al.  Approximate Clustering Via the Mountain Method , 1994, IEEE Trans. Syst. Man Cybern. Syst..