Comparative Study of Fuzzy k-Nearest Neighbor and Fuzzy C-means Algorithms

Fuzzy clustering techniques handle the fuzzy relationships among the data points and with the cluster centers (may be termed as cluster fuzziness). On the other hand, distance measures are important to compute the load of such fuzziness. These are the two important parameters governing the quality of the clusters and the run time. Visualization of multidimensional data clusters into lower dimensions is another important research area to note the hidden patterns within the clusters. This paper investigates the effects of cluster fuzziness and three different distance measures, such as Manhattan distance (MH), Euclidean distance (ED), and Cosine distance (COS) on Fuzzy c-means (FCM) and Fuzzy k-nearest neighborhood (FkNN) clustering techniques, implemented on Iris and extended Wine data. The quality of the clusters is assessed based on (i) data discrepancy factor (i.e., DDF, proposed in this study), (ii) cluster size, (iii) its compactness, (iv) distinctiveness, (v) execution time taken, and (vi) cluster fuzziness (m) values. The study observes that FCM handles the cluster fuzziness better than FkNN. MH distance measure yields best clusters with both FCM and FkNN. Finally, best clusters are visualized using a Self Organizing Map (SOM). General terms:

[1]  Gang Wang,et al.  A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method , 2011, Knowl. Based Syst..

[2]  Hao Wang,et al.  Network intrusion detection based on hybrid Fuzzy C-mean clustering , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[3]  Xiang Li,et al.  Advanced Data Mining and Applications (ADMA) , 2008, ADMA 2008.

[4]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[5]  Dilip Kumar Pratihar,et al.  Fuzzy-Logic-Based Screening and Prediction of Adult Psychoses: A Novel Approach , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[6]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[7]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[8]  Subhagata Chattopadhyay,et al.  Comparing Fuzzy-C Means and K-Means Clustering Techniques: A Comprehensive Study , 2012 .

[9]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[10]  Gui Rong Weng,et al.  Segmentation of cDNA Microarray Image Using Fuzzy c-Mean Algorithm and Mathematical Morphology , 2011 .

[11]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[12]  Dilip Kumar Pratihar,et al.  Developing fuzzy classifiers to predict the chance of occurrence of adult psychoses , 2008, Knowl. Based Syst..

[13]  Dilip Kumar Pratihar,et al.  Performance Studies of Some Similarity-Based Fuzzy Clustering Algorithms , 2006 .

[14]  H. S. Chen,et al.  Suicidal Risk Evaluation Using a Similarity-Based Classifier , 2008, ADMA.

[15]  M. Forina,et al.  Chemometrical investigation on four red wines from a single cultivar grown in the Piedmont region , 1990 .

[16]  Muhammad Arif,et al.  Arrhythmia Beat Classification Using Pruned Fuzzy K-Nearest Neighbor Classifier , 2009, 2009 International Conference of Soft Computing and Pattern Recognition.

[17]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[18]  Gang Wang,et al.  An Adaptive Fuzzy k-Nearest Neighbor Method Based on Parallel Particle Swarm Optimization for Bankruptcy Prediction , 2011, PAKDD.

[19]  Dilip Kumar Pratihar,et al.  Some studies on fuzzy clustering of psychosis data , 2007, Int. J. Bus. Intell. Data Min..

[20]  S. R. Kannan,et al.  Modified fuzzy c-means algorithm for segmentation of T1-T2-weighted brain MRI , 2011, J. Comput. Appl. Math..

[21]  Manoranjan Dash,et al.  Entropy-based fuzzy clustering and fuzzy modeling , 2000, Fuzzy Sets Syst..

[22]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[23]  Fang Yu,et al.  Preliminary Study on Quantification of Duck Color Based on Fuzzy K – Nearest Neighbor Method , 2010 .

[24]  Sang-Hyuk Lee,et al.  Power interconnected system clustering with advanced fuzzy C-mean algorithm , 2011 .

[25]  Kaoru Hirota,et al.  Fuzzy few-Nearest Neighbor Method with a Few Samples for Personal Authentication , 2010, J. Adv. Comput. Intell. Intell. Informatics.

[26]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[27]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[28]  Dilip Kumar Pratihar,et al.  Some studies on mapping methods , 2006, Int. J. Bus. Intell. Data Min..