An Efficient GA-based Clustering Technique

In this paper, we propose a GA-based unsupervised clustering technique that selects cluster centers directly from the data set, allowing it to speed up the fitness evaluation by constructing a look-up table in advance, saving the distances between all pairs of data points, and by using binary representation rather than string representation to encode a variable number of cluster centers. More effective versions of operators for reproduction, crossover, and mutation are introduced. Finally, the Davies-Bouldin index is employed to measure the validity of clusters. The development of our algorithm has demonstrated an ability to properly cluster a variety of data sets. The experimental results show that the proposed algorithm provides a more stable clustering performance in terms of number of clusters and clustering results. This results in considerable less computational time required, when compared to other GA-based clustering algorithms.

[1]  Chien-Hsing Chou,et al.  Short Papers , 2001 .

[2]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[3]  Donald R. Jones,et al.  Solving Partitioning Problems with Genetic Algorithms , 1991, International Conference on Genetic Algorithms.

[4]  David B. Fogel,et al.  Evolutionary algorithms in theory and practice , 1997, Complex.

[5]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[6]  S. Bandyopadhyay,et al.  Nonparametric genetic clustering: comparison of validity indices , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[7]  Vijay V. Raghavan,et al.  A clustering strategy based on a formalism of the reproductive process in natural systems , 1979, SIGIR 1979.

[8]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[9]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[10]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[11]  Rajarshi Das,et al.  A Study of Control Parameters Affecting Online Performance of Genetic Algorithms for Function Optimization , 1989, ICGA.

[12]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[13]  M. Narasimha Murty,et al.  Clustering with evolution strategies , 1994, Pattern Recognit..

[14]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[15]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[16]  G. De Soete,et al.  Clustering and Classification , 2019, Data-Driven Science and Engineering.

[17]  Vijay V. Raghavan,et al.  A clustering strategy based on a formalism of the reproductive process in natural systems , 1979, SIGIR '79.

[18]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.