Improving the Scalability of EA Techniques: A Case Study in Clustering

This paper studies how evolutionary algorithms (EA) scale with growing genome size, when used for similarity-based clustering. A simple EA and EAs with problem-dependent knowledge are experimentally evaluated for clustering up to 100,000 objects. We find that EAs with problem-dependent crossover or hybridization scale near-linear in the size of the similarity matrix, while the simple EA, even with problem-dependent initialization, fails at moderately large genome sizes.

[1]  Graham Kendall,et al.  An investigation of a hyperheuristic genetic algorithm applied to a trainer scheduling problem , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[2]  William E. Hart,et al.  Recent Advances in Memetic Algorithms , 2008 .

[3]  Subbarao Kambhampati,et al.  Evolutionary Computing , 1997, Lecture Notes in Computer Science.

[4]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[5]  Gao Xinbo,et al.  A GA-based clustering algorithm for large data sets with mixed and categorical values , 2003, Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003.

[6]  Patrick D. Surry,et al.  Inoculation to Initialise Evolutionary Search , 1996, Evolutionary Computing, AISB Workshop.

[7]  Ioannis T. Christou,et al.  A Two-Phase Genetic Algorithm for Large-Scale Bidline-Generation Problems at Delta Air Lines , 1999, Interfaces.

[8]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[9]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[10]  A. Sima Etaner-Uyar,et al.  Multiobjective evolutionary clustering of Web user sessions: a case study in Web page recommendation , 2010, Soft Comput..

[11]  Jie Li,et al.  A GA-based clustering algorithm for large data sets with mixed numeric and categorical values , 2003, International Symposium on Multispectral Image Processing and Pattern Recognition.

[12]  Pedro M. S. Carvalho,et al.  On spanning-tree recombination in evolutionary large-scale network problems - application to electrical distribution planning , 2001, IEEE Trans. Evol. Comput..

[13]  Emin Erkan Korkmaz,et al.  A Two-Level Clustering Method Using Linear Linkage Encoding , 2006, PPSN.

[14]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[15]  Sanghamitra Bandyopadhyay,et al.  Further Experimentations on the Scalability of the GEMGA , 1998, PPSN.

[16]  Albert Y. Zomaya,et al.  Solutions to Parallel and Distributed Computing Problems , 2001 .

[17]  David E. Goldberg,et al.  Sporadic model building for efficiency enhancement of hierarchical BOA , 2006, GECCO.

[18]  Thomas Bäck,et al.  Parallel Problem Solving from Nature — PPSN V , 1998, Lecture Notes in Computer Science.

[19]  Enrique Alba,et al.  Parallelism and evolutionary algorithms , 2002, IEEE Trans. Evol. Comput..

[20]  Kalyanmoy Deb,et al.  Efficiently Solving: A Large-Scale Integer Linear Program Using a Customized Genetic Algorithm , 2004, GECCO.

[21]  Gregor von Laszewski,et al.  Intelligent Structural Operators for the k-way Graph Partitioning Problem , 1991, ICGA.

[22]  Edmund K. Burke,et al.  Parallel Problem Solving from Nature - PPSN IX: 9th International Conference, Reykjavik, Iceland, September 9-13, 2006, Proceedings , 2006, PPSN.

[23]  David E. Goldberg,et al.  Convergence Time for the Linkage Learning Genetic Algorithm , 2004, Evolutionary Computation.

[24]  Riccardo Poli,et al.  Genetic and Evolutionary Computation – GECCO 2004 , 2004, Lecture Notes in Computer Science.