FKMAWCW: Categorical fuzzy k-modes clustering with automated attribute-weight and cluster-weight learning

Abstract The fuzzy k-modes (FKM) is a popular method for clustering categorical data. However, the main problem of this algorithm is that it is very sensitive to the initialization of primary clusters, so inappropriate initial cluster centers lead to poor local optima. Another problem with the FKM is the equal importance of the attributes used during the clustering process, which in real applications, the importance of the attributes are different, and some attributes are more important than others. Some versions of FKM have been presented in the literature, each of which has somehow solved one of the above problems. In this paper, we propose a new clustering method (FKMAWCW) to solve mentioned problems at the same time. In the proposed clustering process, a local attribute weighting mechanism is used to weight the attributes of each cluster properly. Also, a cluster weighting mechanism is proposed to solve the initialization sensitivity. Attribute weight and cluster weight are learned simultaneously and automatically during the clustering process. In addition, to reduce the noise sensitivity, a new distance function is proposed. So, the proposed algorithm can tolerate noisy environment. Extensive experiments on 11 benchmark datasets and an artificially generated dataset show that the proposed algorithm performs better than the state-of-the-art algorithms. This paper presents mathematical analyses to obtain updating functions, providing the convergence proof of the algorithm. The implementation source code of FKMAWCW is made publicly available at https://github.com/Amin-Golzari-Oskouei/FKMAWCW .

[1]  Siriporn Supratid,et al.  Modified fuzzy ants clustering approach , 2009, Applied Intelligence.

[2]  Rui Zhang,et al.  Robust Embedded Deep K-means Clustering , 2019, CIKM.

[3]  Hong Jia,et al.  Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Yongguo Liu,et al.  Attribute weights-based clustering centres algorithm for initialising K-modes clustering , 2018, Cluster Computing.

[5]  Feng Jiang,et al.  Initialization of K-modes clustering using outlier detection techniques , 2016, Inf. Sci..

[6]  J. Carroll,et al.  Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables , 1984 .

[7]  Jiye Liang,et al.  A weighting k-modes algorithm for subspace clustering of categorical data , 2013, Neurocomputing.

[8]  Mahdi Hashemzadeh,et al.  New fuzzy C-means clustering method based on feature-weight and cluster-weight learning , 2019, Appl. Soft Comput..

[9]  Mahdi Hashemzadeh,et al.  Fire detection for video surveillance applications using ICA K-medoids-based color model and efficient spatio-temporal visual features , 2019, Expert Syst. Appl..

[10]  Hong Jia,et al.  A New Distance Metric for Unsupervised Learning of Categorical Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Van-Nam Huynh,et al.  k-PbC: an improved cluster center initialization for categorical data clustering , 2020, Applied Intelligence.

[12]  Feng Zhao,et al.  Robust Local Feature Weighting Hard C-Means Clustering Algorithm , 2011, IScIDE.

[13]  Michael K. Ng,et al.  An optimization algorithm for clustering using weighted dissimilarity measures , 2004, Pattern Recognit..

[14]  Haining Huang,et al.  MMDBC: Density-Based Clustering Algorithm for Mixed Attributes and Multi-dimension Data , 2018, 2018 IEEE International Conference on Big Data and Smart Computing (BigComp).

[15]  Xiao-Jun Zeng,et al.  Fuzzy C-means++: Fuzzy C-means with effective seeding initialization , 2015, Expert Syst. Appl..

[16]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[17]  Measuring variation for nominal data , 1988 .

[18]  Feiping Nie,et al.  Deep Fuzzy K-Means With Adaptive Loss and Entropy Regularization , 2020, IEEE Transactions on Fuzzy Systems.

[19]  Jiye Liang,et al.  An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data , 2011, Knowl. Based Syst..

[20]  Asgarali Bouyer,et al.  An Optimized K-Harmonic Means Algorithm Combined with Modified Particle Swarm Optimization and Cuckoo Search Algorithm , 2016 .

[21]  Youlong Yang,et al.  A dissimilarity measure for mixed nominal and ordinal attribute data in k-Modes algorithm , 2020, Applied Intelligence.

[22]  Mohammad-Reza Feizi-Derakhshi,et al.  TopicBERT: A Transformer transfer learning based memory-graph approach for multimodal streaming social media topic detection , 2020, ArXiv.

[23]  Jian Yu,et al.  General C-Means Clustering Model , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Jiye Liang,et al.  A novel attribute weighting algorithm for clustering high-dimensional categorical data , 2011, Pattern Recognit..

[25]  Jiye Liang,et al.  Space Structure and Clustering of Categorical Data , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Van-Nam Huynh,et al.  A k-Means-Like Algorithm for Clustering Categorical Data Using an Information Theoretic-Based Dissimilarity Measure , 2016, FoIKS.

[27]  Fionn Murtagh,et al.  A novel data clustering algorithm based on gravity center methodology , 2020, Expert Syst. Appl..

[28]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-modes clustering , 2013, Expert Syst. Appl..

[29]  Minho Kim,et al.  Projected clustering for categorical datasets , 2006, Pattern Recognit. Lett..

[30]  Hong-Jie Xing,et al.  Further improvements in Feature-Weighted Fuzzy C-Means , 2014, Inf. Sci..

[31]  Rui Zhang,et al.  Regularized Regression with Fuzzy Membership Embedding for Unsupervised Feature Selection , 2020 .

[32]  Swagatam Das,et al.  Categorical fuzzy k-modes clustering with automated feature weight learning , 2015, Neurocomputing.

[33]  Zied Chtourou,et al.  Clustering Categorical Data: A Survey , 2020, Int. J. Inf. Technol. Decis. Mak..

[34]  Junyan Liu,et al.  Kernel-based MinMax clustering methods with kernelization of the metric and auto-tuning hyper-parameters , 2019, Neurocomputing.

[35]  Q. M. Jonathan Wu Wavelet-based Moving Object Segmentation , 2009 .

[36]  Abdolreza Hatamlou,et al.  An efficient hybrid clustering method based on improved cuckoo optimization and modified particle swarm optimization algorithms , 2018, Appl. Soft Comput..

[37]  W.-S. Hsieh,et al.  Wavelet-based moving object segmentation , 2003 .

[38]  Amol P. Bhopale,et al.  Swarm optimized cluster based framework for information retrieval , 2020, Expert Syst. Appl..

[39]  Ming-Syan Chen,et al.  On Data Labeling for Clustering Categorical Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[40]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[41]  Xuelong Li,et al.  Joint Learning of Fuzzy k-Means and Nonnegative Spectral Clustering With Side Information , 2019, IEEE Transactions on Image Processing.

[42]  Jiye Liang,et al.  The k-modes type clustering plus between-cluster information for categorical data , 2014, Neurocomputing.

[43]  Mohamed Bouguessa Clustering categorical data in projected spaces , 2013, Data Mining and Knowledge Discovery.

[44]  Van-Nam Huynh,et al.  A method for k-means-like clustering of categorical data , 2019, Journal of Ambient Intelligence and Humanized Computing.

[45]  Shengrui Wang,et al.  Soft subspace clustering of categorical data with probabilistic distance , 2016, Pattern Recognit..

[46]  Joshua Zhexue Huang,et al.  A New Initialization Method for Clustering Categorical Data , 2007, PAKDD.

[47]  Lihong Xu,et al.  Many-objective fuzzy centroids clustering algorithm for categorical data , 2018, Expert Syst. Appl..

[48]  Cina Motamed,et al.  ExEm: Expert Embedding using dominating set theory with deep learning approaches , 2020, Expert Syst. Appl..

[49]  Yanhong Li,et al.  Incremental entropy-based clustering on categorical data streams with concept drift , 2014, Knowl. Based Syst..

[50]  A. Govardhan,et al.  Experiments on Hypothesis "Fuzzy K-Means is Better than K-Means for Clustering" , 2014 .

[51]  Miin-Shen Yang,et al.  Bootstrapping approach to feature-weight selection in fuzzy c-means algorithms with an application in color image segmentation , 2008, Pattern Recognit. Lett..

[52]  R. J. Kuo,et al.  Genetic intuitionistic weighted fuzzy k-modes algorithm for categorical data , 2019, Neurocomputing.

[53]  Eva Portillo,et al.  Feature weighting methods: A review , 2021, Expert Syst. Appl..

[54]  Michael J. Brusco,et al.  A note on using the adjusted Rand index for link prediction in networks , 2015, Soc. Networks.

[55]  Jiye Liang,et al.  A new initialization method for categorical data clustering , 2009, Expert Syst. Appl..

[56]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[57]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[58]  Aristidis Likas,et al.  The MinMax k-Means clustering algorithm , 2014, Pattern Recognit..

[59]  Amir Ahmad,et al.  K-Harmonic means type clustering algorithm for mixed datasets , 2016, Appl. Soft Comput..