A Genetic XK-Means Algorithm with Empty Cluster Reassignment

K-Means is a well known and widely used classical clustering algorithm. It is easy to fall into local optimum and it is sensitive to the initial choice of cluster centers. XK-Means (eXploratory K-Means) has been introduced in the literature by adding an exploratory disturbance onto the vector of cluster centers, so as to jump out of the local optimum and reduce the sensitivity to the initial centers. However, empty clusters may appear during the iteration of XK-Means, causing damage to the efficiency of the algorithm. The aim of this paper is to introduce an empty-cluster-reassignment technique and use it to modify XK-Means, resulting in an EXK-Means clustering algorithm. Furthermore, we combine the EXK-Means with genetic mechanism to form a genetic XK-Means algorithm with empty-cluster-reassignment, referred to as GEXK-Means clustering algorithm. The convergence of GEXK-Means to the global optimum is theoretically proved. Numerical experiments on a few real world clustering problems are carried out, showing the advantage of EXK-Means over XK-Means, and the advantage of GEXK-Means over EXK-Means, XK-Means, K-Means and GXK-Means (genetic XK-Means).

[1]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[2]  Michael J. Laszlo,et al.  A genetic algorithm that exchanges neighboring centers for k-means clustering , 2007, Pattern Recognit. Lett..

[3]  Marcin Wozniak,et al.  Object detection and recognition via clustered features , 2018, Neurocomputing.

[4]  Olga Kurasova,et al.  Dimensionality Reduction Methods: The Comparison Of Speed And Accuracy , 2018, Inf. Technol. Control..

[5]  Xindong Wu,et al.  Automatic clustering using genetic algorithms , 2011, Appl. Math. Comput..

[6]  Hui Cao,et al.  Multi-Objective Gene Expression Programming for Clustering , 2012, Inf. Technol. Control..

[7]  Noureddine Bouhmala,et al.  Enhanced Genetic Algorithm with K-Means for the Clustering Problem , 2015 .

[8]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[9]  J.G.R. Sathiaseelan,et al.  Feature Selection Using K-Means Genetic Algorithm for Multi-objective Optimization , 2015 .

[10]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[11]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Taesung Park,et al.  Robust imputation method for missing values in microarray data , 2007, BMC Bioinformatics.

[13]  Feng Lin,et al.  A novel parallelization approach for hierarchical clustering , 2005, Parallel Comput..

[14]  T. Santhanam,et al.  Application of K-Means and Genetic Algorithms for Dimension Reduction by Integrating SVM for Diabetes Diagnosis , 2015 .

[15]  Günter Rudolph,et al.  Convergence analysis of canonical genetic algorithms , 1994, IEEE Trans. Neural Networks.

[16]  Hisao Ishibuchi,et al.  Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining , 2004, Fuzzy Sets Syst..

[17]  Zhen Ji,et al.  PK-means: A new algorithm for gene clustering , 2008, Comput. Biol. Chem..

[18]  Daniele Apiletti,et al.  METATECH: METeorological Data Analysis for Thermal Energy CHaracterization by Means of Self-Learning Transparent Models , 2018 .

[19]  Kusum Deep,et al.  A new mutation operator for real coded genetic algorithms , 2007, Appl. Math. Comput..

[20]  Ujjwal Maulik,et al.  A new multi-objective technique for differential fuzzy clustering , 2011, Appl. Soft Comput..

[21]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[22]  Md Zahidul Islam,et al.  Combining K-Means and a genetic algorithm through a novel arrangement of genetic operators for high quality clustering , 2018, Expert Syst. Appl..

[23]  Peter Wai-Ming Tsang,et al.  eXploratory K-Means: A new simple and efficient algorithm for gene clustering , 2012, Appl. Soft Comput..

[24]  Yuewei Liu,et al.  A SAS macro for testing differences among three or more independent groups using Kruskal-Wallis and Nemenyi tests , 2012, Journal of Huazhong University of Science and Technology. Medical sciences = Hua zhong ke ji da xue xue bao. Yi xue Ying De wen ban = Huazhong keji daxue xuebao. Yixue Yingdewen ban.

[25]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[27]  K. Krishna,et al.  Genetic K-means algorithm. , 1999, IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society.

[28]  Naixue Xiong,et al.  A novel particle swarm optimizer with multi-stage transformation and genetic operation for VLSI routing , 2018, ArXiv.

[29]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[30]  Fang Miao,et al.  An Automatic K-Means Clustering Algorithm of GPS Data Combining a Novel Niche Genetic Algorithm with Noise and Density , 2017, ISPRS Int. J. Geo Inf..