An Investigation of Representations and Operators for Evolutionary Data Clustering with a Variable Number of Clusters

This paper analyses the properties of four alternative representation/operator combinations suitable for data clustering algorithms that keep the number of clusters variable. These representations are investigated in the context of their performance when used in a multiobjective evolutionary clustering algorithm (MOCK), which we have described previously. To shed light on the resulting performance differences observed, we consider the relative size of the search space and heuristic bias inherent to each representation, as well as its locality and heritability under the associated variation operators. We find that the representation that performs worst when a random initialization is employed, is nevertheless the best overall performer given the heuristic initialization normally used in MOCK. This suggests there are strong interaction effects between initialization, representation and operators in this problem.

[1]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[2]  Eckart Zitzler,et al.  Evolutionary algorithms for multiobjective optimization: methods and applications , 1999 .

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  Günther R. Raidl,et al.  Empirical Analysis of Locality, Heritability and Heuristic Bias in Evolutionary Algorithms: A Case Study for the Multidimensional Knapsack Problem , 2005, Evolutionary Computation.

[5]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[6]  I. Goulden,et al.  Combinatorial Enumeration , 2004 .

[7]  Patrick D. Surry,et al.  Fitness Variance of Formae and Performance Prediction , 1994, FOGA.

[8]  Martin J. Oates,et al.  PESA-II: region-based selection in evolutionary multiobjective optimization , 2001 .

[9]  Xin Yao,et al.  An evolutionary clustering algorithm for gene expression microarray data analysis , 2006, IEEE Transactions on Evolutionary Computation.

[10]  Rowena Cole,et al.  Clustering with genetic algorithms , 1998 .

[11]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[12]  Franz Rothlauf,et al.  Redundant Representations in Evolutionary Computation , 2003, Evolutionary Computation.

[13]  Joshua D. Knowles,et al.  Evolutionary Multiobjective Clustering , 2004, PPSN.

[14]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .