Branches of evolutionary algorithms and their effectiveness for clustering records

Clustering is a process that aims to group the similar records in one cluster and dissimilar records in different clusters. K-means is one of the most popular and well-known clustering technique for its simplicity and light weight. However, the main drawback of K-means clustering technique is that it requires a user (data miner) to estimate the number of clusters in advance. Another limitation of K-means is that it has a tendency to get stuck at local optima. In order to overcome these limitations many evolutionary algorithm based clustering techniques have been proposed since the 1990s and applied to various fields. In this paper, we present an up-to-date review of some major evolutionary algorithm based clustering techniques for the last twenty (20) years (1995-2015). A total of 63 ranked (i.e. based on citation reports and JCR/CORE rank) evolutionary algorithm based clustering approaches are reviewed. Maximum of the techniques do not require any user to define the number of clusters in advance. We present the limitations and advantages of some evolutionary algorithm based clustering techniques. We also present a thorough discussion and future research directions of evolutionary algorithm based clustering techniques.

[1]  Stefano Rizzi,et al.  Topological clustering of maps using a genetic algorithm , 1995, Pattern Recognit. Lett..

[2]  Mukesh M. Raghuwanshi,et al.  Genetic Algorithm Based Clustering: A Survey , 2008, 2008 First International Conference on Emerging Trends in Engineering and Technology.

[3]  Abdolreza Hatamlou,et al.  Black hole: A new heuristic optimization approach for data clustering , 2013, Inf. Sci..

[4]  Dantong Ouyang,et al.  An artificial bee colony approach for clustering , 2010, Expert Syst. Appl..

[5]  Stefania Montani,et al.  Retrieval and clustering for supporting business process adjustment and analysis , 2014, Inf. Syst..

[6]  Xianda Zhang,et al.  A genetic algorithm with gene rearrangement for K-means clustering , 2009, Pattern Recognit..

[7]  Mu-Yen Chen,et al.  A hybrid ANFIS model for business failure prediction utilizing particle swarm optimization and subtractive clustering , 2013, Inf. Sci..

[8]  Ujjwal Maulik,et al.  A study of some fuzzy cluster validity indices, genetic clustering and application to pixel classification , 2005, Fuzzy Sets Syst..

[9]  Anthony Tzes,et al.  Genetic-based fuzzy clustering for DC-motor friction identification and compensation , 1998, IEEE Trans. Control. Syst. Technol..

[10]  S. R. Kannan,et al.  Effective fuzzy c-means based kernel function in segmenting medical images , 2010, Comput. Biol. Medicine.

[11]  R. J. Kuo,et al.  Automatic kernel clustering with bee colony optimization algorithm , 2014, Inf. Sci..

[12]  Weiguo Sheng,et al.  Template-Free Biometric-Key Generation by Means of Fuzzy Genetic Clustering , 2008, IEEE Transactions on Information Forensics and Security.

[13]  Yong Zeng,et al.  Eliciting compact T-S fuzzy models using subtractive clustering and coevolutionary particle swarm optimization , 2009, Neurocomputing.

[14]  Tutut Herawan,et al.  Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters , 2011, Int. J. Inf. Retr. Res..

[15]  Ling Qing,et al.  Crowding clustering genetic algorithm for multimodal function optimization , 2006 .

[16]  Lin-Yu Tseng,et al.  A genetic approach to the automatic clustering problem , 2001, Pattern Recognit..

[17]  Xiujuan Lei,et al.  The clustering model and algorithm of PPI network based on propagating mechanism of artificial bee colony , 2013, Inf. Sci..

[18]  R. J. Kuo,et al.  Integration of particle swarm optimization and genetic algorithm for dynamic clustering , 2012, Inf. Sci..

[19]  Javier Del Ser,et al.  A new grouping genetic algorithm for clustering problems , 2012, Expert Syst. Appl..

[20]  Tutut Herawan,et al.  An Improved Parameter less Data Clustering Technique based on Maximum Distance of Data and Lioyd k-means Algorithm , 2012 .

[21]  Ramez Elmasri,et al.  Optimizing clustering algorithm in mobile ad hoc networks using genetic algorithmic approach , 2002, Global Telecommunications Conference, 2002. GLOBECOM '02. IEEE.

[22]  Kaoru Hirota,et al.  Hyperbox clustering with Ant Colony Optimization (HACO) method and its application to medical risk profile recognition , 2009, Appl. Soft Comput..

[23]  Ujjwal Maulik,et al.  An evolutionary technique based on K-Means algorithm for optimal clustering in RN , 2002, Inf. Sci..

[24]  Xiaohui Yan,et al.  A new approach for data clustering using hybrid artificial bee colony algorithm , 2012, Neurocomputing.

[25]  Xindong Wu,et al.  Automatic clustering using genetic algorithms , 2011, Appl. Math. Comput..

[26]  Bo Jiang,et al.  Particle swarm optimization with age-group topology for multimodal functions and data clustering , 2013, Commun. Nonlinear Sci. Numer. Simul..

[27]  C. A. Murthy,et al.  In search of optimal clusters using genetic algorithms , 1996, Pattern Recognit. Lett..

[28]  Mehrnoush Shamsfard,et al.  An improved bee colony optimization algorithm with an application to document clustering , 2015, Neurocomputing.

[29]  Andrea Schenone,et al.  A fuzzy clustering based segmentation system as support to diagnosis in medical imaging , 1999, Artif. Intell. Medicine.

[30]  Paul Scheunders,et al.  A genetic c-Means clustering algorithm applied to color image quantization , 1997, Pattern Recognit..

[31]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Bassem Jarboui,et al.  Combinatorial particle swarm optimization (CPSO) for partitional clustering problem , 2007, Appl. Math. Comput..

[33]  Tutut Herawan,et al.  MaxD K-Means: A Clustering Algorithm for Auto-generation of Centroids and Distance of Data Points in Clusters , 2012, ISICA.

[34]  Neeraja Menon,et al.  Brain Tumor Segmentation in MRI images using unsupervised Artificial Bee Colony algorithm and FCM clustering , 2015, 2015 International Conference on Communications and Signal Processing (ICCSP).

[35]  Sandra Paterlini,et al.  Differential evolution and particle swarm optimisation in partitional clustering , 2006, Comput. Stat. Data Anal..

[36]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Wei Chen,et al.  Gene expression data analysis with the clustering method based on an improved quantum-behaved Particle Swarm Optimization , 2012, Eng. Appl. Artif. Intell..

[38]  Tiranee Achalakul,et al.  The best-so-far ABC with multiple patrilines for clustering problems , 2013, Neurocomputing.

[39]  Ayhan Demiriz,et al.  Semi-Supervised Clustering Using Genetic Algorithms , 1999 .

[40]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[41]  Chang-Tsun Li,et al.  Unsupervised texture segmentation using multiresolution hybrid genetic algorithm , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[42]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[43]  Cong Wang,et al.  Chaotic ant swarm approach for data clustering , 2012, Appl. Soft Comput..

[44]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[45]  B. Kulkarni,et al.  An ant colony approach for clustering , 2004 .

[46]  Roy George,et al.  A variable-length genetic algorithm for clustering and classification , 1995, Pattern Recognit. Lett..

[47]  Chi-Yang Tsai,et al.  Particle swarm optimization with selective particle regeneration for data clustering , 2011, Expert Syst. Appl..

[48]  Guojun Gan,et al.  Application of Data Clustering and Machine Learning in Variable Annuity Valuation , 2013 .

[49]  Yimin Liu,et al.  Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data , 2014, Knowl. Based Syst..

[50]  Ali Maroosi,et al.  Application of honey-bee mating optimization algorithm on clustering , 2007, Appl. Math. Comput..

[51]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[52]  W. Art Chaovalitwongse,et al.  Capacitated clustering problem in computational biology: Combinatorial and statistical approach for sibling reconstruction , 2012, Comput. Oper. Res..

[53]  V. Mani,et al.  Clustering using firefly algorithm: Performance study , 2011, Swarm Evol. Comput..

[54]  Tunchan Cura,et al.  A particle swarm optimization approach to clustering , 2012, Expert Syst. Appl..

[55]  Tzung-Pei Hong,et al.  Using group genetic algorithm to improve performance of attribute clustering , 2015, Appl. Soft Comput..

[56]  Michael Nikolaou,et al.  A hybrid approach to global optimization using a clustering algorithm in a genetic search framework , 1998 .

[57]  Pedro Larrañaga,et al.  Applying genetic algorithms to search for the best hierarchical clustering of a dataset , 1999, Pattern Recognit. Lett..

[58]  Lei Zhang,et al.  A novel ant-based clustering algorithm using the kernel method , 2011, Inf. Sci..

[59]  Siripen Wikaisuksakul,et al.  A multi-objective genetic algorithm with fuzzy c-means for automatic data clustering , 2014, Appl. Soft Comput..

[60]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[61]  Daoqiang Zhang,et al.  Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation , 2007, Pattern Recognit..

[62]  Nur Evin Özdemirel,et al.  Ant Colony Optimization based clustering methodology , 2015, Appl. Soft Comput..

[63]  Lin-Yu Tseng,et al.  Genetic algorithms for clustering, feature selection and classification , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[64]  WangLiang,et al.  A new approach for data clustering using hybrid artificial bee colony algorithm , 2012 .

[65]  Iraj Mahdavi,et al.  A genetic algorithm for a creativity matrix cubic space clustering: A case study in Mazandaran Gas Company , 2013, Appl. Soft Comput..

[66]  T Watson Layne,et al.  A Genetic Algorithm Approach to Cluster Analysis , 1998 .

[67]  Md Zahidul Islam,et al.  Clustering by genetic algorithm- high quality chromosome selection for initial population , 2015, 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA).

[68]  Tieli Sun,et al.  An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization , 2009, Expert Syst. Appl..

[69]  Manoj Kumar Tiwari,et al.  A fuzzy clustering-based genetic algorithm approach for time-cost-quality trade-off problems: A case study of highway construction project , 2013, Eng. Appl. Artif. Intell..

[70]  Cheng-Lung Huang,et al.  Hybridization strategies for continuous ant colony optimization and particle swarm optimization applied to data clustering , 2013, Appl. Soft Comput..

[71]  Ujjwal Maulik,et al.  Towards improving fuzzy clustering using support vector machine: Application to gene expression data , 2009, Pattern Recognit..

[72]  Md Zahidul Islam,et al.  A hybrid clustering technique combining a novel genetic algorithm with K-Means , 2014, Knowl. Based Syst..

[73]  Mehmet Korürek,et al.  A new arrhythmia clustering technique based on Ant Colony Optimization , 2008, J. Biomed. Informatics.

[74]  Lei Zhang,et al.  A novel ant-based clustering algorithm using Renyi entropy , 2013, Appl. Soft Comput..

[75]  Feng Zhao,et al.  Optimal-selection-based suppressed fuzzy c-means clustering algorithm with self-tuning non local spatial information for image segmentation , 2014, Expert Syst. Appl..

[76]  Amit Konar,et al.  Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm , 2008, Pattern Recognit. Lett..

[77]  Michael J. Laszlo,et al.  A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78]  K. Shanti Swarup,et al.  Particle swarm optimization based K-means clustering approach for security assessment in power systems , 2011, Expert Syst. Appl..

[79]  Paolo Rosso,et al.  An efficient Particle Swarm Optimization approach to cluster short texts , 2014, Inf. Sci..

[80]  Witold Pedrycz,et al.  Protein complex identification through Markov clustering with firefly algorithm on dynamic protein-protein interaction networks , 2016, Inf. Sci..

[81]  Yao Zhao,et al.  A genetic clustering algorithm using a message-based similarity measure , 2012, Expert Syst. Appl..

[82]  Rita Cucchiara,et al.  Genetic algorithms for clustering in machine vision , 1998, Machine Vision and Applications.

[83]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[84]  Dervis Karaboga,et al.  A novel clustering approach: Artificial Bee Colony (ABC) algorithm , 2011, Appl. Soft Comput..

[85]  Sim Heng Ong,et al.  Integrating spatial fuzzy clustering with level set methods for automated medical image segmentation , 2011, Comput. Biol. Medicine.

[86]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[87]  Li-Yeh Chuang,et al.  Chaotic particle swarm optimization for data clustering , 2011, Expert Syst. Appl..

[88]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .