Genetic algorithms for clustering, feature selection and classification

Traditional methods for clustering problems, such as the K-means algorithm and its variants, usually ask the user to provide the number of clusters. Unfortunately, the number of clusters in general is unknown to the user. Therefore, the clustering becomes a tedious trial-and-error work and the clustering result is often not very promising especially when the number of clusters is large. In this paper we propose a genetic algorithm for the clustering problem. This algorithm can automatically cluster the data according to their similarities and automatically find the proper number of clusters. In traditional classification methods, usually a set of parameters is used to represent a class. But in many case, although belonging to the same class, the data may be divided into several clusters and the data in each cluster may have different characteristics. Hence, we also apply the genetic algorithm to the classification problem and obtain good results especially when the situation stated above happened. Another genetic algorithm is proposed for the feature selection problem. This algorithm can not only search for a good set of features but also find the weight of each feature such that the application of these features associated with their weights to the classification problem will achieve a good classification rate. Experimental results are given to illustrate the effectiveness of these genetic algorithms.