This paper presents a new clustering genetic algorithm for data mining applications. A simple encoding scheme that yields to constant-lenght chromosomes is used, thus allowing the application of the standard genetic operators. Besides, a consistent algorithm, which avoids the problems of redundant codification and context insensitivity, is developed. In addition, a very simple heuristic is applied in order to generate the initial population. The individual fitness is determined based on the Euclidean distances among the objects, as well as on the number of objects belonging to each cluster. The clustering genetic algorithm is evaluated in the database known as Australian Credit Approval. The algorithm performs well considering that the rule set should provide low complexity measures (at most five clusters). The experimental results show that the proposed method is very promising.
[1]
A. F. Smith,et al.
Statistical analysis of finite mixture distributions
,
1986
.
[2]
Emanuel Falkenauer,et al.
Genetic Algorithms and Grouping Problems
,
1998
.
[3]
Lawrence. Davis,et al.
Handbook Of Genetic Algorithms
,
1990
.
[4]
Ali S. Hadi,et al.
Finding Groups in Data: An Introduction to Chster Analysis
,
1991
.
[5]
Rowena Cole,et al.
Clustering with genetic algorithms
,
1998
.
[6]
David E. Goldberg,et al.
Genetic Algorithms in Search Optimization and Machine Learning
,
1988
.
[7]
John A. Hartigan,et al.
Clustering Algorithms
,
1975
.
[8]
Michael R. Anderberg,et al.
Cluster Analysis for Applications
,
1973
.