Harmonious Genetic Clustering

To automatically determine the number of clusters and generate more quality clusters while clustering data samples, we propose a harmonious genetic clustering algorithm, named HGCA, which is based on harmonious mating in eugenic theory. Different from extant genetic clustering methods that only use fitness, HGCA aims to select the most suitable mate for each chromosome and takes into account chromosomes gender, age, and fitness when computing mating attractiveness. To avoid illegal mating, we design three mating prohibition schemes, i.e., no mating prohibition, mating prohibition based on lineal relativeness, and mating prohibition based on collateral relativeness, and three mating strategies, i.e., greedy eugenics-based mating strategy, eugenics-based mating strategy based on weighted bipartite matching, and eugenics-based mating strategy based on unweighted bipartite matching, for harmonious mating. In particular, a novel single-point crossover operator called variable-length-and-gender-balance crossover is devised to probabilistically guarantee the balance between population gender ratio and dynamics of chromosome lengths. We evaluate the proposed approach on real-life and artificial datasets, and the results show that our algorithm outperforms existing genetic clustering methods in terms of robustness, efficiency, and effectiveness.

[1]  K. G. Srinivasa,et al.  A self-adaptive migration model genetic algorithm for data mining applications , 2007, Inf. Sci..

[2]  Xindong Wu,et al.  Automatic clustering using genetic algorithms , 2011, Appl. Math. Comput..

[3]  Andries Petrus Engelbrecht,et al.  Dynamic clustering using particle swarm optimization with application in image segmentation , 2006, Pattern Analysis and Applications.

[4]  Chungnan Lee,et al.  On the harmonious mating strategy through tabu search , 2003, Inf. Sci..

[5]  Larry J. Eshelman,et al.  Preventing Premature Convergence in Genetic Algorithms by Preventing Incest , 1991, ICGA.

[6]  Byron Dom,et al.  An Information-Theoretic External Cluster-Validity Measure , 2002, UAI.

[7]  Ujjwal Maulik,et al.  Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification , 2003, IEEE Trans. Geosci. Remote. Sens..

[8]  Saeed Jalili,et al.  Dynamic clustering using combinatorial particle swarm optimization , 2012, Applied Intelligence.

[9]  Worthy N. Martin,et al.  Enhancing GA Performance through Crossover Prohibitions Based on Ancestry , 1995, International Conference on Genetic Algorithms.

[10]  Andrew V. Goldberg,et al.  A new approach to the maximum flow problem , 1986, STOC '86.

[11]  L. Shao,et al.  From Heuristic Optimization to Dictionary Learning: A Review and Comprehensive Comparison of Image Denoising Algorithms , 2014, IEEE Transactions on Cybernetics.

[12]  Agostinho C. Rosa,et al.  Self-adjusting the intensity of assortative mating in genetic algorithms , 2008, Soft Comput..

[13]  Xuelong Li,et al.  Intrinsic Image Decomposition Using Optimization and User Scribbles , 2013, IEEE Transactions on Cybernetics.

[14]  Gabriela Ochoa,et al.  Assortative Mating in Genetic Algorithms for Dynamic Problems , 2005, EvoWorkshops.

[15]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Sankar K. Pal,et al.  Genotypic and Phenotypic Assortative Mating in Genetic Algorithm , 1998, Inf. Sci..

[17]  M.-C. Su,et al.  A new cluster validity measure and its application to image compression , 2004, Pattern Analysis and Applications.

[18]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[19]  Yimin Liu,et al.  Integrating Multi-Objective Genetic Algorithm and Validity Analysis for Locating and Ranking Alternative Clustering , 2005, Informatica.

[20]  Myong Kee Jeong,et al.  A two-leveled symbiotic evolutionary algorithm for clustering problems , 2012, Applied Intelligence.

[21]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[22]  C. A. Conceição António,et al.  A hierarchical genetic algorithm with age structure for multimodal optimal design of hybrid composites , 2006 .

[23]  Carlos Ansótegui,et al.  A Gender-Based Genetic Algorithm for the Automatic Configuration of Algorithms , 2009, CP.

[24]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[25]  Ian F. C. Smith,et al.  A comprehensive validity index for clustering , 2008, Intell. Data Anal..

[26]  T. Fukuda,et al.  Genetic algorithm with age structure and its application to self-organizing manufacturing system , 1994, ETFA '94. 1994 IEEE Symposium on Emerging Technologies and Factory Automation. (SEIKEN) Symposium) -Novel Disciplines for the Next Century- Proceedings.

[27]  Hong He,et al.  A two-stage genetic algorithm for automatic clustering , 2012, Neurocomputing.

[28]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[29]  K. Matsui New selection method to improve the population diversity in genetic algorithms , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[30]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Dana Vrajitoru,et al.  Simulating Gender Separation and Mating Constraints for Genetic Algorithms , 2005 .

[32]  Yiqiao Cai,et al.  Differential Evolution With Neighborhood and Direction Information for Numerical Optimization , 2013, IEEE Transactions on Cybernetics.

[33]  Amit Konar,et al.  Metaheuristic Pattern Clustering – An Overview , 2009 .

[34]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Steven M. Lalonde,et al.  A First Course in Multivariate Statistics , 1997, Technometrics.

[36]  Francesco Camastra,et al.  Offline Cursive Character Challenge: a New Benchmark for Machine Learning and Pattern Recognition Algorithms. , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[37]  Weiguo Sheng,et al.  A weighted sum validity function for clustering with a hybrid niching genetic algorithm , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[38]  John J. Grefenstette,et al.  Optimization of Control Parameters for Genetic Algorithms , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[39]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[40]  Zdzislaw Kowalczuk,et al.  Improving Evolutionary Multi-objective Optimization Using Genders , 2006, ICAISC.

[41]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[42]  Qingfu Zhang,et al.  A Population Prediction Strategy for Evolutionary Dynamic Multiobjective Optimization , 2014, IEEE Transactions on Cybernetics.

[43]  Agostinho C. Rosa,et al.  Using assortative mating in genetic algorithms for vector quantization problems , 2001, SAC.

[44]  Alfons Juan-Císcar,et al.  Comparison of Four Initialization Techniques for the K -Medians Clustering Algorithm , 2000, SSPR/SPR.

[45]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[46]  R. Sivakumar,et al.  Ant-based Clustering Algorithms: A Brief Survey , 2010 .

[47]  Mukesh M. Raghuwanshi,et al.  Genetic Algorithm Based Clustering: A Survey , 2008, 2008 First International Conference on Emerging Trends in Engineering and Technology.

[48]  Han Ping,et al.  A dynamic genetic algorithm for clustering web pages , 2010, The 2nd International Conference on Software Engineering and Data Mining.

[49]  S. Emlen,et al.  Ecology, sexual selection, and the evolution of mating systems. , 1977, Science.