Automatic Clustering with Self-Organizing Maps and Genetic Algorithms

The analysis of data sets of unknown characteristics usually demands that subsets (or clusters) of the data are identified in such a way that the members of any one such cluster display common (in some sense) characteristics. In order to do this we must determine a) The number of clusters, b) The clusters themselves and c) The labeling of every element in the data set such that each element belongs uniquely to one of the clusters. We discuss an algorithm which allows us to solve (b) and (c); we assume that (a) is given. We show that the so-called labeling problem may be solved by minimizing an adequate measure of distance. We discuss several such metrics, the corresponding minimization (genetic) algorithm and offer some results derived from its application.