A Comparative Study of Hard and Fuzzy Data Clustering Algorithms with Cluster Validity Indices

Data clustering is one of the important data mining methods. It is a process of finding classes of a data set with most similarity in the same class and most dissimilarity between different classes. The well known hard clustering algorithm (K -means) and Fuzzy clustering algorithm (FCM) are mostly based on Euclidean distance measure. In this paper, a comparative study of these algorithms with different distance measures such as Chebyshev and Chi-square is proposed. The new algorithms are tested on the four well known data sets such as Contraceptive Method Choice (CMC), Diabetes, Liver Disorders and Statlog (Heart) from the UCI repository. Experimental results show that FCM based on Chi-square distance measure gives better result than Chebyshev distance measure. We also propose the FCM algorithm based on σ -distance measure. The FCM algorithm is also tested with cluster validity indices such as partition coefficient and partition entropy. The results show that Chebyshev distance measure is reported maximum partition coefficient and minimum partition entropy than the other distance measures. This paper also provides a brief review of applications of K -means and Fuzzy c-means algorithms.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Gaetano Manfredi,et al.  Statistical Face Recognition via a k-Means Iterative Algorithm , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[3]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[4]  Qingmao Hu,et al.  Regularized fuzzy c-means method for brain tissue clustering , 2007, Pattern Recognit. Lett..

[5]  Surbhi Gupta,et al.  A k-means Clustering Based Approach for Evaluation of Success of Software Reuse , 2011 .

[6]  Rui Yan,et al.  Fuzzy C-Means Clustering of Web Users for Educational Sites , 2003, Canadian Conference on AI.

[7]  Hong Liu,et al.  Application Research of k-means Clustering Algorithm in Image Retrieval System , 2009 .

[8]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[9]  Jan Murlewski,et al.  Clustering algorithms for bank customer segmentation , 2005, 5th International Conference on Intelligent Systems Design and Applications (ISDA'05).

[10]  Meng Jianliang,et al.  The Application on Intrusion Detection Based on K-means Cluster Algorithm , 2009, 2009 International Forum on Information Technology and Applications.

[11]  Wuling Ren,et al.  Application of Network Intrusion Detection Based on Fuzzy C-Means Clustering Algorithm , 2009, 2009 Third International Symposium on Intelligent Information Technology Application.

[12]  Jahangir Hossain,et al.  Determination of Typical Load Profile of Consumers Using Fuzzy C-Means Clustering Algorithm , 2011 .

[13]  Bernardino Arcay Varela,et al.  Analysis of Fuzzy Clustering Algorithms for the Segmentation of Burn Wounds Photographs , 2006, ICIAR.

[15]  Ye Qian An Application Based on K-Means Algorithm for Clustering Companies Listed , 2006, 2006 IEEE International Conference on Service Operations and Logistics, and Informatics.

[16]  Xiao-jun Lou,et al.  Improved Fuzzy C-means Clustering Algorithm Based on Cluster Density , 2012 .

[17]  O. O. Oladipupo,et al.  Application of k Means Clustering algorithm for prediction of Students Academic Performance , 2010, ArXiv.

[18]  Yan Li,et al.  K-means clustering algorithm application in university libraries , 2011, IEEE 10th International Conference on Cognitive Informatics and Cognitive Computing (ICCI-CC'11).

[19]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[20]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .

[21]  M. A. Balafar Fuzzy C-mean based brain MRI segmentation algorithms , 2012, Artificial Intelligence Review.

[22]  J. Bezdek Numerical taxonomy with fuzzy sets , 1974 .

[23]  Du-Ming Tsai,et al.  Fuzzy C-means based clustering for linearly and nonlinearly separable data , 2011, Pattern Recognit..

[24]  Turgay Ibrikci,et al.  Fuzzy C-Means Based DNA Motif Discovery , 2008, ICIC.

[25]  Limin Yin,et al.  Application of two fuzzy c-means clustering algorithms in segmenting the sonar image from a small underwater target into multi-regions , 2011, 2011 Second International Conference on Mechanic Automation and Control Engineering.

[26]  V. N. Manjunath Aradhya,et al.  An Application of K-Means Clustering for Improving Video Text Detection , 2012, ISI.

[27]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[28]  Jianjun Meng,et al.  Unsupervised Adaptation Based on Fuzzy C-Means for Brain-Computer Interface , 2009, 2009 First International Conference on Information Science and Engineering.

[29]  Guoqing Wang,et al.  An application of fuzzy c-means clustering analysis to classification of tobaccos based on their rare earth elements contents , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[30]  Jie Li,et al.  An In-depth Analysis of Fuzzy C-Means Clustering for Cellular Manufacturing , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[31]  Jingyu Sun,et al.  Application of K-means Clustering Algorithms in News Comments , 2010, 2010 International Conference on E-Business and E-Government.

[32]  S. Tsukahara,et al.  Short-term traffic prediction using fuzzy c-means and cellular automata in a wide-area road network , 2005, Proceedings. 2005 IEEE Intelligent Transportation Systems, 2005..