A modified Fuzzy k-Partition based on indiscernibility relation for categorical data clustering

Categorical data clustering has been adopted by many scientific communities to classify objects from large databases. In order to classify the objects, Fuzzy k-Partition approach has been proposed for categorical data clustering. However, existing Fuzzy k-Partition approaches suffer from high computational time and low clustering accuracy. Moreover, the parameter maximize of the classification likelihood function in Fuzzy k-Partition approach will always have the same categories, hence producing the same results. To overcome these issues, we propose a modified Fuzzy k-Partition based on indiscernibility relation. The indiscernibility relation induces an approximation space which is constructed by equivalence classes of indiscernible objects, thus it can be applied to classify categorical data. The novelty of the proposed approach is that unlike previous approach that use the likelihood function of multivariate multinomial distributions, the proposed approach is based on indescernibility relation. We performed an extensive theoretical analysis of the proposed approach to show its effectiveness in achieving lower computational complexity. Further, we compared the proposed approach with Fuzzy Centroid and Fuzzy k-Partition approaches in terms of response time and clustering accuracy on several UCI benchmark and real world datasets. The results show that the proposed approach achieves lower response time and higher clustering accuracy as compared to other Fuzzy k-based approaches.

[1]  Mustafa Mat Deris,et al.  Applying variable precision rough set model for clustering student suffering study's anxiety , 2012, Expert Syst. Appl..

[2]  Jon M. Kleinberg,et al.  Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[3]  Michael K. Ng,et al.  On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  M. M. Deris,et al.  ROSMAN: ROugh Set approach for clustering Supplier base MANagement( SOFT COMPUTING METHODOLOGIES AND ITS APPLICATIONS) , 2011 .

[5]  Zhibo Chen,et al.  Multi-Agent Reinforcement Learning Based on Bidding , 2009, 2009 First International Conference on Information Science and Engineering.

[6]  Tian,et al.  An Optimal Spectral Clustering Approach Based on Cauchy-Schwarz Divergence , 2009 .

[7]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[8]  A. Scott,et al.  Clustering methods based on likelihood ratio criteria. , 1971 .

[9]  Henry L. Harris,et al.  Helping Students Cope with Test Anxiety. ERIC Digest. , 2003 .

[10]  B B B X R X X,et al.  MMR : AN ALGORITHM FOR CLUSTERING CATEGORICAL DATA USING ROUGH SET THEORY , 2007 .

[11]  Jacek M. Leski,et al.  Fuzzy c-ordered-means clustering , 2016, Fuzzy Sets Syst..

[12]  James M. Keller,et al.  Improvements to the relational fuzzy c-means clustering algorithm , 2014, Pattern Recognit..

[13]  Sotirios Chatzis,et al.  A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional , 2011, Expert Syst. Appl..

[14]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[15]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[16]  S. Miyamoto,et al.  FORMULATIONS OF FUZZY CLUSTERING FOR CATEGORICAL DATA , 2005 .

[17]  Peter Bryant,et al.  Asymptotic behaviour of classification maximum likelihood estimates , 1978 .

[18]  Lei Jiang,et al.  A Clustering Algorithm FCM-ACO for Supplier Base Management , 2010, ADMA.

[19]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Francisco de A. T. de Carvalho,et al.  Fuzzy c-means clustering methods for symbolic interval data , 2007, Pattern Recognit. Lett..

[21]  Michael J. Symons,et al.  Clustering criteria and multivariate normal mixtures , 1981 .

[22]  Ohn Mar San,et al.  An alternative extension of the k-means algorithm for clustering categorical data , 2004 .

[23]  Zdzislaw Pawlak Rough classification , 1999, Int. J. Hum. Comput. Stud..

[24]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[25]  Miin-Shen Yang A survey of fuzzy clustering , 1993 .

[26]  Miin-Shen Yang,et al.  A fuzzy k-partitions model for categorical data and its comparison to the GoM model , 2008, Fuzzy Sets Syst..

[27]  Miin-Shen Yang,et al.  Alternative c-means clustering algorithms , 2002, Pattern Recognit..

[28]  Andrzej Skowron,et al.  Rudiments of rough sets , 2007, Inf. Sci..

[29]  Doheon Lee,et al.  Fuzzy clustering of categorical data using fuzzy centroids , 2004, Pattern Recognit. Lett..

[30]  David Kronemyer,et al.  Stress and anxiety: counterpart elements of the stress/anxiety complex. , 2014, The Psychiatric clinics of North America.

[31]  Wlodzislaw Duch,et al.  Understanding neurodynamical systems via Fuzzy Symbolic Dynamics , 2010, Neural Networks.

[32]  Zengyou He,et al.  Improving K-Modes Algorithm Considering Frequencies of Attribute Values in Mode , 2005, CIS.

[33]  Rollin McCraty,et al.  Enhancing Emotional, Social, and Academic Learning With Heart Rhythm Coherence Feedback , 2005 .

[34]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[35]  L. Hubert,et al.  Comparing partitions , 1985 .

[36]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[37]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.