CLUSTERING CATEGORICAL DATA USING k-MODES BASED ON CUCKOO SEARCH OPTIMIZATION ALGORITHM

Cluster analysis is the unsupervised learning technique that finds the interesting patterns in the data objects without knowing class labels. Most of the real world dataset consists of categorical data. For example, social media analysis may have the categorical data like the gender as male or female. The k-modes clustering algorithm is the most widely used to group the categorical data, because it is easy to implement and efficient to handle the large amount of data. However, due to its random selection of initial centroids, it provides the local optimum solution. There are number of optimization algorithms are developed to obtain global optimum solution. Cuckoo Search algorithm is the population based metaheuristic optimization algorithms to provide the global optimum solution. Methods: In this paper, k-modes clustering algorithm is combined with Cuckoo Search algorithm to obtain the global optimum solution. Results: Experiments are conducted with benchmark datasets and the results are compared with k-modes and Particle Swarm Optimization with k-modes to prove the efficiency of the proposed algorithm.

[1]  Hui Wang,et al.  The Cuckoo search algorithm based on fuzzy C-mean clustering , 2017, 2017 36th Chinese Control Conference (CCC).

[2]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[3]  Fred W. Glover,et al.  Tabu Search , 1997, Handbook of Heuristics.

[4]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[5]  F. V. D. Bergh An Analysis of Particle Swarm Optimizers(PSO) , 2013 .

[6]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[7]  Václav Snásel,et al.  Clustering categorical data using a swarm-based method , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[8]  Frans van den Bergh,et al.  An analysis of particle swarm optimizers , 2002 .

[9]  Amir Hossein Gandomi,et al.  Hybridizing harmony search algorithm with cuckoo search for global numerical optimization , 2014, Soft Computing.

[10]  J. Wu,et al.  A genetic fuzzy k-Modes algorithm for clustering categorical data , 2009, Expert Syst. Appl..

[11]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[12]  Zhe Wang,et al.  A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data , 2015, PloS one.

[13]  Zijiang Yang,et al.  A Genetic k-Modes Algorithm for Clustering Categorical Data , 2005, ADMA.

[14]  Lu Mei,et al.  A Novel PSO k-Modes Algorithm for Clustering Categorical Data , 2012 .

[15]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[16]  S. Shanthi,et al.  Cuckoo Search based K-Prototype Clustering Algorithm , 2017 .

[17]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[18]  K Lakshmi,et al.  Clustering Mixed Datasets Using K-Prototype Algorithm Based on Crow-Search Optimization , 2018 .

[19]  Xin-She Yang,et al.  Engineering optimisation by cuckoo search , 2010 .

[20]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[21]  Michael K. Ng,et al.  Clustering categorical data sets using tabu search techniques , 2002, Pattern Recognit..

[22]  Xin-She Yang,et al.  Cuckoo Search via Lévy flights , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[23]  Mei Lu,et al.  3D Object Retrieval Based on PSO-K-Modes Method , 2013, J. Softw..