Rough Set Based Fuzzy K-Modes for Categorical Data

With the growing demand of categorical data clustering, a new hybrid clustering algorithm, namely Rough set based Fuzzy K-Modes, is proposed in this paper. The principles of rough and fuzzy sets are used in integrated form. It gives the better handling of uncertainty, vagueness, and incompleteness in class definition, while using the concept of lower and upper approximations of rough, on the other hand, the membership function of fuzzy sets enables efficient handling of overlapping partitions. Superiority of the proposed method over state-of-the-art methods is demonstrated quantitatively. For this purpose, two artificial and two real life categorical data sets are used. Also statistical significance test has been carried out to establish the statistical significance of the proposed clustering results.

[1]  El-Ghazali Talbi,et al.  Clustering Nominal and Numerical Data: A New Distance Concept for a Hybrid Genetic Algorithm , 2004, EvoCOP.

[2]  Rolf Drechsler,et al.  Applications of Evolutionary Computing, EvoWorkshops 2008: EvoCOMNET, EvoFIN, EvoHOT, EvoIASP, EvoMUSART, EvoNUM, EvoSTOC, and EvoTransLog, Naples, Italy, March 26-28, 2008. Proceedings , 2008, EvoWorkshops.

[3]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[4]  G. A. Ferguson,et al.  Statistical analysis in psychology and education , 1960 .

[5]  Ujjwal Maulik,et al.  A new multi-objective technique for differential fuzzy clustering , 2011, Appl. Soft Comput..

[6]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[7]  Ujjwal Maulik,et al.  An Improved Multi-objective Technique for Fuzzy Clustering with Application to IRS Image Segmentation , 2009, EvoWorkshops.

[8]  Sankar K. Pal,et al.  Rough Set Based Generalized Fuzzy $C$ -Means Algorithm and Quantitative Indices , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Pawan Lingras,et al.  Interval Set Clustering of Web Users with Rough K-Means , 2004, Journal of Intelligent Information Systems.

[10]  James E. Gentle,et al.  Finding Groups in Data: An Introduction to Cluster Analysis. , 1991 .

[11]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[12]  J. Wu,et al.  A genetic fuzzy k-Modes algorithm for clustering categorical data , 2009, Expert Syst. Appl..

[13]  Jennifer Blackhurst,et al.  MMR: An algorithm for clustering categorical data using Rough Set Theory , 2007, Data Knowl. Eng..

[14]  Zengyou He,et al.  Attribute value weighting in k-modes clustering , 2011, Expert Syst. Appl..

[15]  Witold Pedrycz,et al.  Rough–Fuzzy Collaborative Clustering , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Ujjwal Maulik,et al.  Clustering using Multi-objective Genetic Algorithm and its Application to Image Segmentation , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[17]  Jens Gottlieb,et al.  Evolutionary Computation in Combinatorial Optimization , 2006, Lecture Notes in Computer Science.

[18]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[19]  Liang Bai,et al.  A dissimilarity measure for the k-Modes clustering algorithm , 2012, Knowl. Based Syst..

[20]  Ujjwal Maulik,et al.  Modified differential evolution based fuzzy clustering for pixel classification in remote sensing imagery , 2009, Pattern Recognit..

[21]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[22]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[23]  D. Dubois,et al.  ROUGH FUZZY SETS AND FUZZY ROUGH SETS , 1990 .

[24]  Ujjwal Maulik,et al.  Integrating Clustering and Supervised Learning for Categorical Data Analysis , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.