Large-Scale Experimental Evaluation of Cluster Representations for Multiobjective Evolutionary Clustering

Multiobjective evolutionary clustering algorithms are based on the optimization of several objective functions that guide the search following a cycle based on evolutionary algorithms. Their capabilities allow them to find better solutions than with conventional clustering algorithms if the suitable individual representation is selected. This paper provides a detailed analysis of the three most relevant and useful representations-prototype-based, label-based, and graph-based-through a wide set of synthetic data sets. Moreover, they are also compared to relevant conventional clustering algorithms. Experiments show that multiobjective evolutionary clustering is competitive with regard to other clustering algorithms. Furthermore, the best scenario for each representation is also presented.

[1]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Cluster Analysis , 1999 .

[2]  Xavier Llorà,et al.  Large‐scale data mining using genetics‐based machine learning , 2013, GECCO.

[3]  Eréndira Rendón,et al.  Internal versus External cluster validation indexes , 2011 .

[4]  Alvaro Garcia-Piquer,et al.  Analysis of vulnerability assessment results based on CAOS , 2011, Appl. Soft Comput..

[5]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[6]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[7]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[8]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[11]  Anil K. Jain,et al.  Multiobjective data clustering , 2004, CVPR 2004.

[12]  Durga Prasad Mohapatra,et al.  A Node-Marking Technique for Dynamic Slicing of Aspect-Oriented Programs , 2007 .

[13]  William B. Langdon,et al.  Fitness Causes Bloat in Variable Size Representations , 1997 .

[14]  Hisao Ishibuchi,et al.  Evolutionary many-objective optimization: A short review , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[15]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[16]  Csaba Legány,et al.  Cluster validity measurement techniques , 2006 .

[17]  Carlos A. Coello Coello,et al.  Recent Trends in Evolutionary Multiobjective Optimization , 2005, Evolutionary Multiobjective Optimization.

[18]  Lothar Thiele,et al.  Comparison of Multiobjective Evolutionary Algorithms: Empirical Results , 2000, Evolutionary Computation.

[19]  Ka Yee Yeung,et al.  Details of the Adjusted Rand index and Clustering algorithms Supplement to the paper “ An empirical study on Principal Component Analysis for clustering gene expression data ” ( to appear in Bioinformatics ) , 2001 .

[20]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[21]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[22]  David W. Coit,et al.  Multi-objective optimization using genetic algorithms: A tutorial , 2006, Reliab. Eng. Syst. Saf..

[23]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[24]  R. Prim Shortest connection networks and some generalizations , 1957 .

[25]  Sriparna Saha,et al.  A generalized automatic clustering algorithm in a multiobjective framework , 2013, Appl. Soft Comput..

[26]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[27]  Xavier Llorà,et al.  Large scale data mining using genetics-based machine learning , 2009, GECCO '09.

[28]  Ujjwal Maulik,et al.  Multiobjective Genetic Fuzzy Clustering of Categorical Attributes , 2007, 10th International Conference on Information Technology (ICIT 2007).

[29]  Sanghamitra Bandyopadhyay,et al.  A symmetry based multiobjective clustering technique for automatic evolution of clusters , 2010, Pattern Recognit..

[30]  Xin Yao,et al.  An evolutionary clustering algorithm for gene expression microarray data analysis , 2006, IEEE Transactions on Evolutionary Computation.

[31]  Alvaro Garcia-Piquer,et al.  Data classification through an evolutionary approach based on multiple criteria , 2011, Knowledge and Information Systems.

[32]  Martin J. Oates,et al.  PESA-II: region-based selection in evolutionary multiobjective optimization , 2001 .

[33]  R. E. Lee,et al.  Distribution-free multiple comparisons between successive treatments , 1995 .

[34]  Ricardo J. G. B. Campello,et al.  Improving the Efficiency of a Clustering Genetic Algorithm , 2004, IBERAMIA.

[35]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Anil K. Jain,et al.  Multiobjective data clustering , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[37]  Jaume Bacardit Peñarroya Pittsburgh genetic-based machine learning in the data mining era: representations, generalization, and run-time , 2004 .

[38]  Joshua D. Knowles,et al.  An Investigation of Representations and Operators for Evolutionary Data Clustering with a Variable Number of Clusters , 2006, PPSN.

[39]  T Watson Layne,et al.  A Genetic Algorithm Approach to Cluster Analysis , 1998 .

[40]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Multi-objective clustering ensemble for gene expression data analysis , 2009, Neurocomputing.

[41]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[42]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[43]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[44]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[45]  Sam Kwong,et al.  Multi-Objective Evolutionary Clustering using Variable-Length Real Jumping Genes Genetic Algorithm , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[46]  Xavier Llorà,et al.  XCS and GALE: A Comparative Study of Two Learning Classifier Systems on Data Mining , 2001, IWLCS.