M&MFCM: Fuzzy C-means Clustering with Mahalanobis and Minkowski Distance Metrics

Abstract The proposed modification of conventional fuzzy C-means clustering (FCM) algorithm aims to correct some of its shortcomings. We have focused on as missing flexibility in cluster number adaptation; limited cluster type grouping; less than optimal objective function for clusters of unequal size lying very close to each other; considerable computational time particularly in case of high dimensional data. With M&MFCM we propose to replace the usual Euclidean distance with Mahalanobis and Minkowski metrics in order to enhance the cluster detection capacity of FCM by allowing more accurate detection of arbitrary shapes of clusters for high dimensional datasets. Direct replacement of Euclidean distance in the objective function of FCM with Mahalanobis might cause numerical problems as the largest eigenvalues of the fuzzy covariance matrix could produce extremely long clusters thus contradicting the real data distribution. The improvement is achieved by fixing the ratio between the maximal and minimal eigenvalues of the covariance matrix. The parameterized Minkowski distance metric is adapted for implementation with FCM with various settings. We also propose an approach for improving the initial choice of cluster number and for visualization and analysis of cluster results for labeled and unlabeled datasets. Experimental results demonstrate that the proposed M&MFCM and test methodology significantly improve FCM clustering results.

[1]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[2]  Lequan Min,et al.  Novel modified fuzzy c-means algorithm with applications , 2009, Digit. Signal Process..

[3]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  J. Bezdek,et al.  Detection and Characterization of Cluster Substructure II. Fuzzy c-Varieties and Convex Combinations Thereof , 1981 .

[5]  Ashish Ghosh,et al.  Fuzzy clustering algorithms for unsupervised change detection in remote sensing images , 2011, Inf. Sci..

[6]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[7]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[8]  J. Bezdek,et al.  DETECTION AND CHARACTERIZATION OF CLUSTER SUBSTRUCTURE I. LINEAR STRUCTURE: FUZZY c-LINES* , 1981 .

[9]  Yong Zhang,et al.  Image Segmentation Based on FCM with Mahalanobis Distance , 2010, ICICA.

[10]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[11]  Qian Wang,et al.  The range of the value for the fuzzifier of the fuzzy c-means algorithm , 2012, Pattern Recognit. Lett..

[12]  D. S. Yeung,et al.  Improving Performance of Similarity-Based Clustering by Feature Weight Learning , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Uzay Kaymak,et al.  Improved covariance estimation for Gustafson-Kessel clustering , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).