A new variable---length genome genetic algorithm for data clustering in semeiotics

This paper focuses on the introduction of a new evolutionary algorithm for data clustering, the Self-sizing Genome Genetic Algorithm. It is akin to a messy Genetic Algorithm and does not use a priori information about the number of clusters. A new recombination operator, gene-pooling, is introduced, while fitness is based on simultaneously maximizing intra-cluster homogeneity and inter-cluster separability. This algorithm is applied to clustering in dermatological semeiotics. Moreover, a Pathology Addressing Index is defined to quantify utility of found clusters in unambiguously addressing towards pathologies. Comparison with other clustering tools is performed.