Advantages and limitations of genetic algorithms for clustering records

Clustering is a fundamental and widely used method for grouping similar records in one cluster and dissimilar records in the different cluster. In cluster analysis, a major problem is to determine the appropriate number of cluster in advance. It is difficult for a user (data miner) to estimate the appropriate number of clusters in advance. Another limitation of a well-known clustering technique called K-means is that it gets stuck at local optima. In order to overcome these limitations Genetic Algorithm (GA) based clustering techniques have been proposed in the 1990s. Since then many researchers have developed several evolutionary algorithm based clustering techniques, including GA and applied in various fields. This paper presents an up-to-date review of some major GA-based clustering techniques for the last twenty (20) years. A total of 45 ranked (i.e. based on citation reports and JCR/CORE rank) GA-based clustering approaches are reviewed, which are uses for real-life applications such as real-life data sets, highway construction projects, a Gas Company, cellular networks and satellite image segmentations. Almost two third of the techniques do not require any user to define the number of clusters. Finally, a thorough discussion and emerging research directions are presented.

[1]  Mukesh M. Raghuwanshi,et al.  Genetic Algorithm Based Clustering: A Survey , 2008, 2008 First International Conference on Emerging Trends in Engineering and Technology.

[2]  Jitendra Kumar,et al.  Parallel k-Means Clustering for Quantitative Ecoregion Delineation Using Large Data Sets , 2011, ICCS.

[3]  Tutut Herawan,et al.  An Improved Parameter less Data Clustering Technique based on Maximum Distance of Data and Lioyd k-means Algorithm , 2012 .

[4]  Ramez Elmasri,et al.  Optimizing clustering algorithm in mobile ad hoc networks using genetic algorithmic approach , 2002, Global Telecommunications Conference, 2002. GLOBECOM '02. IEEE.

[5]  Iraj Mahdavi,et al.  A genetic algorithm for a creativity matrix cubic space clustering: A case study in Mazandaran Gas Company , 2013, Appl. Soft Comput..

[6]  T Watson Layne,et al.  A Genetic Algorithm Approach to Cluster Analysis , 1998 .

[7]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[8]  Janko Straßburg,et al.  Parallel genetic algorithms for stock market trading rules , 2012, ICCS.

[9]  Md Zahidul Islam,et al.  Clustering by genetic algorithm- high quality chromosome selection for initial population , 2015, 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA).

[10]  Yao Zhao,et al.  A genetic clustering algorithm using a message-based similarity measure , 2012, Expert Syst. Appl..

[11]  P Festa,et al.  A biased random-key genetic algorithm for data clustering. , 2013, Mathematical biosciences.

[12]  Manoj Kumar Tiwari,et al.  A fuzzy clustering-based genetic algorithm approach for time-cost-quality trade-off problems: A case study of highway construction project , 2013, Eng. Appl. Artif. Intell..

[13]  Wei Song,et al.  Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures , 2009, Expert Syst. Appl..

[14]  Bidyut Baran Chaudhuri,et al.  A novel genetic algorithm for automatic clustering , 2004, Pattern Recognit. Lett..

[15]  Tutut Herawan,et al.  Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters , 2011, Int. J. Inf. Retr. Res..

[16]  Ling Qing,et al.  Crowding clustering genetic algorithm for multimodal function optimization , 2006 .

[17]  Lin-Yu Tseng,et al.  A genetic approach to the automatic clustering problem , 2001, Pattern Recognit..

[18]  Shaowen Wang,et al.  A scalable parallel genetic algorithm for the Generalized Assignment Problem , 2015, Parallel Comput..

[19]  Stefano Rizzi,et al.  Topological clustering of maps using a genetic algorithm , 1995, Pattern Recognit. Lett..

[20]  Pedro A. Diaz-Gomez,et al.  Initial Population for Genetic Algorithms: A Metric Approach , 2007, GEM.

[21]  Ujjwal Maulik,et al.  A study of some fuzzy cluster validity indices, genetic clustering and application to pixel classification , 2005, Fuzzy Sets Syst..

[22]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[23]  Roy George,et al.  A variable-length genetic algorithm for clustering and classification , 1995, Pattern Recognit. Lett..

[24]  C. A. Murthy,et al.  In search of optimal clusters using genetic algorithms , 1996, Pattern Recognit. Lett..

[25]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[26]  SANGHAMITRA BANDYOPADHYAY,et al.  Clustering Using Simulated Annealing with Probabilistic Redistribution , 2001, Int. J. Pattern Recognit. Artif. Intell..

[27]  Tutut Herawan,et al.  MaxD K-Means: A Clustering Algorithm for Auto-generation of Centroids and Distance of Data Points in Clusters , 2012, ISICA.

[28]  Arantza Casillas,et al.  Document Clustering into an Unknown Number of Clusters Using a Genetic Algorithm , 2003, TSD.

[29]  Weiguo Sheng,et al.  Template-Free Biometric-Key Generation by Means of Fuzzy Genetic Clustering , 2008, IEEE Transactions on Information Forensics and Security.

[30]  Yong Tang,et al.  A quantum-inspired genetic algorithm for k-means clustering , 2010, Expert Syst. Appl..

[31]  Christos Dimopoulos,et al.  A hierarchical clustering methodology based on genetic programming for the solution of simple cell-formation problems , 2001 .

[32]  Javier Del Ser,et al.  A new grouping genetic algorithm for clustering problems , 2012, Expert Syst. Appl..

[33]  Nihan Çetin Demirel,et al.  A new geometric shape-based genetic clustering algorithm for the multi-depot vehicle routing problem , 2011, Expert Syst. Appl..

[34]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[35]  Lin-Yu Tseng,et al.  Genetic algorithms for clustering, feature selection and classification , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[36]  Ujjwal Maulik,et al.  Towards improving fuzzy clustering using support vector machine: Application to gene expression data , 2009, Pattern Recognit..

[37]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Md Zahidul Islam,et al.  A hybrid clustering technique combining a novel genetic algorithm with K-Means , 2014, Knowl. Based Syst..

[39]  D.E. Dodds,et al.  A two-phase genetic K-means algorithm for placement of radioports in cellular networks , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[40]  Paul Scheunders,et al.  A genetic c-Means clustering algorithm applied to color image quantization , 1997, Pattern Recognit..

[41]  Chia-Feng Juang,et al.  Hierarchical Cluster-Based Multispecies Particle-Swarm Optimization for Fuzzy-System Optimization , 2010, IEEE Transactions on Fuzzy Systems.

[42]  Hani Pourvaziri,et al.  A hybrid multi-population genetic algorithm for the dynamic facility layout problem , 2014, Appl. Soft Comput..

[43]  Yuchou Chang,et al.  Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm , 2008, Pattern Recognit..

[44]  Ayhan Demiriz,et al.  Semi-Supervised Clustering Using Genetic Algorithms , 1999 .

[45]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[46]  Hong He,et al.  A two-stage genetic algorithm for automatic clustering , 2012, Neurocomputing.

[47]  Zengyou He,et al.  G-ANMI: A mutual information based genetic clustering algorithm for categorical data , 2010, Knowl. Based Syst..

[48]  Rita Cucchiara,et al.  Genetic algorithms for clustering in machine vision , 1998, Machine Vision and Applications.

[49]  Michelle D. Moore,et al.  An accurate parallel genetic algorithm to schedule tasks on a cluster , 2004, Parallel Comput..

[50]  Xindong Wu,et al.  Automatic clustering using genetic algorithms , 2011, Appl. Math. Comput..

[51]  David Jones,et al.  Individual leaf extractions from young canopy images using Gustafson-Kessel clustering and a genetic algorithm , 2006 .

[52]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[53]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[54]  Ujjwal Maulik,et al.  An evolutionary technique based on K-Means algorithm for optimal clustering in RN , 2002, Inf. Sci..

[55]  Xiaohui Yan,et al.  A new approach for data clustering using hybrid artificial bee colony algorithm , 2012, Neurocomputing.

[56]  Chang-Tsun Li,et al.  Unsupervised texture segmentation using multiresolution hybrid genetic algorithm , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[57]  Michael J. Laszlo,et al.  A genetic algorithm that exchanges neighboring centers for k-means clustering , 2007, Pattern Recognit. Lett..

[58]  Kam-Fai Wong,et al.  A genetic algorithm-based clustering approach for database partitioning , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[59]  Yimin Liu,et al.  Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data , 2014, Knowl. Based Syst..

[60]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[61]  Kalyanmoy Deb,et al.  Genetic Algorithms, Noise, and the Sizing of Populations , 1992, Complex Syst..

[62]  Michael J. Laszlo,et al.  A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Xianda Zhang,et al.  A genetic algorithm with gene rearrangement for K-means clustering , 2009, Pattern Recognit..

[64]  Lawrence W. Lan,et al.  Genetic clustering algorithms , 2001, Eur. J. Oper. Res..

[65]  Anthony Tzes,et al.  Genetic-based fuzzy clustering for DC-motor friction identification and compensation , 1998, IEEE Trans. Control. Syst. Technol..

[66]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[67]  Tzung-Pei Hong,et al.  Using group genetic algorithm to improve performance of attribute clustering , 2015, Appl. Soft Comput..

[68]  Michael Nikolaou,et al.  A hybrid approach to global optimization using a clustering algorithm in a genetic search framework , 1998 .

[69]  Pedro Larrañaga,et al.  Applying genetic algorithms to search for the best hierarchical clustering of a dataset , 1999, Pattern Recognit. Lett..

[70]  Siripen Wikaisuksakul,et al.  A multi-objective genetic algorithm with fuzzy c-means for automatic data clustering , 2014, Appl. Soft Comput..

[71]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .