Genetic intuitionistic weighted fuzzy k-modes algorithm for categorical data

Abstract Data clustering with categorical attributes has been widely used in many real-world applications. Most of the existing clustering algorithms proposed for the categorical data face two major drawbacks of termination at a local optimal solution and considering all attributes equally. Thus, this study proposes a novel clustering method, named genetic intuitionistic weighted fuzzy k-modes (GIWFKM) algorithm, based on the conventional fuzzy k-modes and genetic algorithm (GA). The proposed algorithm firstly introduces the intuitionistic weighted fuzzy k-modes (IWFKM) algorithm which employs the intuitionistic fuzzy set in the clustering process and the new similarity measure for categorical data based on frequency probability-based distance metric. Then, the GIWFKM algorithm, which integrates the IWFKM algorithm and GA, is proposed to employ the global optimal solution. Moreover, the GIWFKM algorithm performs the unsupervised feature selection based on the correlation coefficient to remove some redundant features which can both improve the clustering performance and reduce the computational time. To evaluate the clustering result, a series of experiments in different categorical datasets are conducted to compare the performance of the proposed algorithms with that of other benchmark algorithms including fuzzy k-modes, weighted fuzzy k-modes, genetic fuzzy k-modes, space structure-based clustering, and many-objective fuzzy centroids clustering algorithms. The experimental results conducted on the datasets collected from UCI machine learning repository exhibit that the GIWFKM algorithm outperforms the other benchmark algorithms in terms of Adjusted Rank Index (ARI) and clustering accuracy (CA).

[1]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[2]  J. Wu,et al.  A genetic fuzzy k-Modes algorithm for clustering categorical data , 2009, Expert Syst. Appl..

[3]  M. Sugeno FUZZY MEASURES AND FUZZY INTEGRALS—A SURVEY , 1993 .

[4]  Swagatam Das,et al.  Categorical fuzzy k-modes clustering with automated feature weight learning , 2015, Neurocomputing.

[5]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[6]  Michael J. Brusco,et al.  A note on using the adjusted Rand index for link prediction in networks , 2015, Soc. Networks.

[7]  Lipika Dey,et al.  A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set , 2007, Pattern Recognit. Lett..

[8]  Zeshui Xu,et al.  Clustering algorithm for intuitionistic fuzzy sets , 2008, Inf. Sci..

[9]  Xu Ze-shui Intuitionistic fuzzy hierarchical clustering algorithms , 2012 .

[10]  M. Tahar Kechadi,et al.  A multi-act sequential game-based multi-objective clustering approach for categorical data , 2017, Neurocomputing.

[11]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[12]  Jiye Liang,et al.  Space Structure and Clustering of Categorical Data , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Lihong Xu,et al.  Many-objective fuzzy centroids clustering algorithm for categorical data , 2018, Expert Syst. Appl..

[14]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[15]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[16]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[17]  Yi Li,et al.  COOLCAT: an entropy-based algorithm for categorical clustering , 2002, CIKM '02.

[18]  R. J. Kuo,et al.  Non-dominated sorting genetic algorithm using fuzzy membership chromosome for categorical data clustering , 2015, Appl. Soft Comput..

[19]  Janusz Kacprzyk,et al.  Recent Developments in the Ordered Weighted Averaging Operators: Theory and Practice , 2011, Studies in Fuzziness and Soft Computing.

[20]  Hong Jia,et al.  A New Distance Metric for Unsupervised Learning of Categorical Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Arindam Chaudhuri,et al.  Intuitionistic Fuzzy Possibilistic C Means Clustering Algorithms , 2015, Adv. Fuzzy Syst..

[22]  Jiye Liang,et al.  A weighting k-modes algorithm for subspace clustering of categorical data , 2013, Neurocomputing.

[23]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[24]  Ronghua Shang,et al.  An intuitionistic fuzzy possibilistic C-means clustering based on genetic algorithm , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[25]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[26]  Krassimir T. Atanassov,et al.  Intuitionistic fuzzy sets , 1986 .

[27]  Kuo-Ping Lin,et al.  A Novel Evolutionary Kernel Intuitionistic Fuzzy $C$ -means Clustering Algorithm , 2014, IEEE Transactions on Fuzzy Systems.

[28]  Zeshui Xu,et al.  A spectral clustering algorithm based on intuitionistic fuzzy information , 2013, Knowl. Based Syst..

[29]  Ujjwal Maulik,et al.  Multiobjective Genetic Algorithm-Based Fuzzy Clustering of Categorical Attributes , 2009, IEEE Transactions on Evolutionary Computation.

[30]  Leonardo Bocchi,et al.  Image Segmentation by a Genetic Fuzzy c-Means Algorithm Using Color and Spatial Information , 2004, EvoWorkshops.