A Survey of Evolutionary Algorithms for Clustering

This paper presents a survey of evolutionary algorithms designed for clustering tasks. It tries to reflect the profile of this area by focusing more on those subjects that have been given more importance in the literature. In this context, most of the paper is devoted to partitional algorithms that look for hard clusterings of data, though overlapping (i.e., soft and fuzzy) approaches are also covered in the paper. The paper is original in what concerns two main aspects. First, it provides an up-to-date overview that is fully devoted to evolutionary algorithms for clustering, is not limited to any particular kind of evolutionary approach, and comprises advanced topics like multiobjective and ensemble-based evolutionary clustering. Second, it provides a taxonomy that highlights some very important aspects in the context of evolutionary data clustering, namely, fixed or variable number of clusters, cluster-oriented or nonoriented operators, context-sensitive or context-insensitive operators, guided or unguided operators, binary, integer, or real encodings, centroid-based, medoid-based, label-based, tree-based, or graph-based representations, among others. A number of references are provided that describe applications of evolutionary algorithms for clustering in different domains, such as image processing, computer security, and bioinformatics. The paper ends by addressing some important issues and open questions that can be subject of future research.

[1]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[2]  Ming Ouyang,et al.  Gaussian mixture clustering and imputation of microarray data , 2004, Bioinform..

[3]  James C. Bezdek,et al.  Genetic algorithm guided clustering , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[4]  Robert Babuska,et al.  Fuzzy Modeling for Control , 1998 .

[5]  David B. Fogel,et al.  Evolution-ary Computation 1: Basic Algorithms and Operators , 2000 .

[6]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the silhouette width criterion for cluster analysis , 2006, Fuzzy Sets Syst..

[7]  Allan Tucker,et al.  Comparing, Contrasting and Combining Clusters in Viral Gene Expression , 2001 .

[8]  Victor J. Rayward-Smith,et al.  Metaheuristics for clustering in KDD , 2005, 2005 IEEE Congress on Evolutionary Computation.

[9]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[10]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Genetic Clustering for Data Mining , 2008, Soft Computing for Knowledge Discovery and Data Mining.

[11]  Ludmila I. Kuncheva,et al.  Experimental Comparison of Cluster Ensemble Methods , 2006, 2006 9th International Conference on Information Fusion.

[12]  Andreas Zell,et al.  Clustering Gene Expression Profiles with Memetic Algorithms , 2002, PPSN.

[13]  Ricardo J. G. B. Campello,et al.  Improving the Efficiency of a Clustering Genetic Algorithm , 2004, IBERAMIA.

[14]  Patrick Pantel,et al.  Clustering by committee , 2003 .

[15]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[16]  F. Klawonn,et al.  Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition , 1999 .

[17]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[18]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[19]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[20]  Paul Scheunders,et al.  A comparison of clustering algorithms applied to color image quantization , 1997, Pattern Recognit. Lett..

[21]  Martin J. Oates,et al.  PESA-II: region-based selection in evolutionary multiobjective optimization , 2001 .

[22]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[23]  Hong Liu,et al.  Evolutionary semi-supervised fuzzy clustering , 2003, Pattern Recognit. Lett..

[24]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[25]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[26]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[27]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Evolutionary optimization of RBF networks , 2000, Proceedings. Vol.1. Sixth Brazilian Symposium on Neural Networks.

[28]  Ricardo J. G. B. Campello,et al.  A Fuzzy Variant of an Evolutionary Algorithm for Clustering , 2007, 2007 IEEE International Fuzzy Systems Conference.

[29]  Ali M. S. Zalzala,et al.  NOCEA: A rule-based evolutionary algorithm for efficient and effective clustering of massive high-dimensional databases , 2007, Appl. Soft Comput..

[30]  Joseph P. Bigus,et al.  Data mining with neural networks , 1996 .

[31]  C. A. Murthy,et al.  In search of optimal clusters using genetic algorithms , 1996, Pattern Recognit. Lett..

[32]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[33]  Durga Prasad Mohapatra,et al.  A Node-Marking Technique for Dynamic Slicing of Aspect-Oriented Programs , 2007 .

[34]  Susana Cecilia Esquivel Evolutionary algorithms for solving multi-objetive problems . Carlos A. Coello Coello, David A. van Veldhuizen and Gary R., Lamont , 2002 .

[35]  R. Krovi,et al.  Genetic algorithms for clustering: a preliminary investigation , 1992, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences.

[36]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[37]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[38]  Paul Scheunders,et al.  A genetic c-Means clustering algorithm applied to color image quantization , 1997, Pattern Recognit..

[39]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[40]  Kuo-Sheng Cheng,et al.  Evolution-Based Tabu Search Approach to Automatic Clustering , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[41]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[42]  Wei Zhang,et al.  A genetic clustering method for intrusion detection , 2004, Pattern Recognit..

[43]  D. Rimm,et al.  Classification of Breast Cancer Using Genetic Algorithms and Tissue Microarrays , 2006, Clinical Cancer Research.

[44]  P. Bertone,et al.  Integrative data mining: the new direction in bioinformatics , 2001, IEEE Engineering in Medicine and Biology Magazine.

[45]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[47]  Ujjwal Maulik,et al.  Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification , 2003, IEEE Trans. Geosci. Remote. Sens..

[48]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[49]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[50]  David B. Fogel,et al.  Evolving fuzzy clusters , 1993, IEEE International Conference on Neural Networks.

[51]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[52]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[53]  T Watson Layne,et al.  A Genetic Algorithm Approach to Cluster Analysis , 1998 .

[54]  T. Van Le Evolutionary fuzzy clustering , 1995, Proceedings of 1995 IEEE International Conference on Evolutionary Computation.

[55]  M. Narasimha Murty,et al.  Clustering with evolution strategies , 1994, Pattern Recognit..

[56]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Multi-Objective Clustering Ensemble , 2006, 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06).

[57]  Olli Nevalainen,et al.  Self-Adaptive Genetic Algorithm for Clustering , 2003, J. Heuristics.

[58]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[60]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[61]  Lin-Yu Tseng,et al.  A genetic approach to the automatic clustering problem , 2001, Pattern Recognit..

[62]  Rowena Cole,et al.  Clustering with genetic algorithms , 1998 .

[63]  Jun Du,et al.  Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering , 2006, Intell. Data Anal..

[64]  Joydeep Ghosh,et al.  A Consensus Framework for Integrating Distributed Clusterings Under Limited Knowledge Sharing , 2002 .

[65]  Vladimir Estivill-Castro Spatial Clustering for Data Mining with Genetic Algorithms , 1997 .

[66]  Shengrui Wang,et al.  FCM-Based Model Selection Algorithms for Determining the Number of Clusters , 2004, Pattern Recognit..

[67]  D. Fogel,et al.  Discovering patterns in spatial data using evolutionary programming , 1996 .

[68]  Thomas Martinetz,et al.  Topology representing networks , 1994, Neural Networks.

[69]  Brian Everitt,et al.  Cluster analysis , 1974 .

[70]  Yi Lu,et al.  Incremental genetic K-means algorithm and its application in gene expression data analysis , 2004, BMC Bioinformatics.

[71]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[72]  G. Klir,et al.  Evolutionary fuzzy c-means clustering algorithm , 1995, Proceedings of 1995 IEEE International Conference on Fuzzy Systems..

[73]  C. B. Lucasius,et al.  On k-medoid clustering of large data sets with the aid of a genetic algorithm: background, feasiblity and comparison , 1993 .

[74]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[75]  L.N. de Castro,et al.  An evolutionary clustering technique with local search to design RBF neural network classifiers , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[76]  Ricardo J. G. B. Campello,et al.  On the efficiency of evolutionary fuzzy clustering , 2009, J. Heuristics.

[77]  Arantza Casillas,et al.  Document Clustering into an Unknown Number of Clusters Using a Genetic Algorithm , 2003, TSD.

[78]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[79]  David B. Fogel Evolutionary Computation: Principles and Practice for Signal Processing , 2004 .

[80]  H GolubGene,et al.  Missing value estimation for DNA microarray gene expression data , 2005 .

[81]  Sang-Ho Lee,et al.  Heterogeneous Clustering Ensemble Method for Combining Different Cluster Results , 2006, BioDM.

[82]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[83]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[84]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[85]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[86]  Bianchi Serique Meiguins,et al.  An Evolutionary Density and Grid-Based Clustering Algorithm , 2007 .

[87]  Ujjwal Maulik,et al.  An evolutionary technique based on K-Means algorithm for optimal clustering in RN , 2002, Inf. Sci..

[88]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[89]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Clustering using genetic algorithm combining validation criteria , 2007, ESANN.

[90]  Weiguo Sheng,et al.  A hybrid algorithm for k-medoid clustering of large data sets , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[91]  Bianchi Serique Meiguins,et al.  EDACluster: An Evolucionary Density and Grid-Based Clustering Algorithm , 2007, SBBD.

[92]  André C. P. L. F. de Carvalho,et al.  Cluster Ensemble and Multi-Objective Clustering Methods , 2008 .

[93]  Roy George,et al.  A variable-length genetic algorithm for clustering and classification , 1995, Pattern Recognit. Lett..

[94]  Ujjwal Maulik,et al.  Multiobjective Genetic Fuzzy Clustering of Categorical Attributes , 2007, 10th International Conference on Information Technology (ICIT 2007).

[95]  C. V. Ramamoorthy,et al.  Knowledge and Data Engineering , 1989, IEEE Trans. Knowl. Data Eng..

[96]  J. Bezdek,et al.  Genetic fuzzy clustering , 1994, NAFIPS/IFIS/NASA '94. Proceedings of the First International Joint Conference of The North American Fuzzy Information Processing Society Biannual Conference. The Industrial Fuzzy Control and Intellige.

[97]  Mukkai S. Krishnamoorthy,et al.  Comparative study of a genetic fuzzy c-means algorithm and a validity guided fuzzy c-means algorithm for locating clusters in noisy data , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[98]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[99]  Ricardo J. G. B. Campello,et al.  Evolutionary algorithms for clustering gene-expression data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[100]  Colin R. Reeves,et al.  Evolutionary computation: a unified approach , 2007, Genetic Programming and Evolvable Machines.

[101]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[102]  Lawrence O. Hall,et al.  Scaling genetically guided fuzzy clustering , 1995, Proceedings of 3rd International Symposium on Uncertainty Modeling and Analysis and Annual Conference of the North American Fuzzy Information Processing Society.

[103]  F. Valafar Pattern Recognition Techniques in Microarray Data Analysis : A Survey , 2002 .

[104]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[105]  S. Bandyopadhyay,et al.  Nonparametric genetic clustering: comparison of validity indices , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[106]  Sam Kwong,et al.  Multi-Objective Evolutionary Clustering using Variable-Length Real Jumping Genes Genetic Algorithm , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[107]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[108]  M. A. Chapman,et al.  Automated Road Extraction from Satellite Imagery Using Hybrid Genetic Algorithms and Cluster Analysis , 2003 .

[109]  Joseph P. Bigus,et al.  Data mining with neural networks: solving business problems from application development to decision support , 1996 .

[110]  Ujjwal Maulik,et al.  An improved algorithm for clustering gene expression data , 2007, Bioinform..

[111]  Ricardo J. G. B. Campello,et al.  Evolving clusters in gene-expression data , 2006, Inf. Sci..

[112]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Evolutionary Radial Basis Functions for Credit Assessment , 2005, Applied Intelligence.

[113]  Sung-Bae Cho,et al.  Evolutionary Fuzzy Clustering Algorithm with Knowledge-Based Evaluation and Applications for Gene Expression Profiling , 2005 .

[114]  Alex Alves Freitas A Review of evolutionary Algorithms for Data Mining , 2008, Soft Computing for Knowledge Discovery and Data Mining.

[115]  Ioannis Sarafis Data mining clustering of high dimensional databases with evolutionary algorithms , 2005 .

[116]  Paulo Fazendeiro,et al.  A semantic driven evolutive fuzzy clustering algorithm , 2007, 2007 IEEE International Fuzzy Systems Conference.

[117]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[118]  Pedro Larrañaga,et al.  Applying genetic algorithms to search for the best hierarchical clustering of a dataset , 1999, Pattern Recognit. Lett..

[119]  Ricardo J. G. B. Campello,et al.  Clustering Gene-Expression Data: A Hybrid Approach that Iterates Between k-Means and Evolutionary Search , 2007 .

[120]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[121]  F. Valafar Pattern Recognition Techniques in Microarray Data Analysis , 2002 .

[122]  Ujjwal Maulik,et al.  A study of some fuzzy cluster validity indices, genetic clustering and application to pixel classification , 2005, Fuzzy Sets Syst..

[123]  Thomas G. Dietterich,et al.  Bioinformatics The Machine Learning Approach 2nd ed. , 2001 .

[124]  Jianzhuang Liu,et al.  A genetics-based approach to fuzzy clustering , 1995, Proceedings of 1995 IEEE International Conference on Fuzzy Systems..

[125]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[126]  Xin Yao,et al.  An evolutionary clustering algorithm for gene expression microarray data analysis , 2006, IEEE Transactions on Evolutionary Computation.

[127]  Ricardo J. G. B. Campello,et al.  Evolutionary search for optimal fuzzy c-means clustering , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[128]  Frank Klawonn,et al.  Fuzzy clustering with evolutionary algorithms , 1998, Int. J. Intell. Syst..

[129]  Yi Lu,et al.  FGKA: a Fast Genetic K-means Clustering Algorithm , 2004, SAC '04.

[130]  Giansalvatore Mecca,et al.  A new algorithm for clustering search results , 2007, Data Knowl. Eng..

[131]  G. De Soete,et al.  Clustering and Classification , 2019, Data-Driven Science and Engineering.

[132]  Joydeep Ghosh,et al.  Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.

[133]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[134]  Zbigniew Michalewicz,et al.  Evolutionary Computation 1 , 2018 .

[135]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[136]  Witold Pedrycz,et al.  Advances in Fuzzy Clustering and its Applications , 2007 .

[137]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[138]  Nelson F. F. Ebecken,et al.  A genetic algorithm for cluster analysis , 2003, Intell. Data Anal..

[139]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[140]  Enrique H. Ruspini,et al.  Numerical methods for fuzzy clustering , 1970, Inf. Sci..

[141]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[142]  Olli Nevalainen,et al.  Genetic Algorithms for Large-Scale Clustering Problems , 1997, Comput. J..

[143]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[144]  Ricardo J. G. B. Campello,et al.  Towards a Fast Evolutionary Algorithm for Clustering , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[145]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[146]  James C. Bezdek,et al.  Optimization of fuzzy clustering criteria using genetic algorithms , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[147]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[148]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.