Automatic clustering using genetic algorithms

Abstract In face of the clustering problem, many clustering methods usually require the designer to provide the number of clusters as input. Unfortunately, the designer has no idea, in general, about this information beforehand. In this article, we develop a genetic algorithm based clustering method called automatic genetic clustering for unknown K (AGCUK). In the AGCUK algorithm, noising selection and division–absorption mutation are designed to keep a balance between selection pressure and population diversity. In addition, the Davies–Bouldin index is employed to measure the validity of clusters. Experimental results on artificial and real-life data sets are given to illustrate the effectiveness of the AGCUK algorithm in automatically evolving the number of clusters and providing the clustering partition.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  C. A. Murthy,et al.  Genetic Algorithm with Elitist Model and Its Convergence , 1996, Int. J. Pattern Recognit. Artif. Intell..

[3]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..

[4]  P. Kumsawat,et al.  A new approach for optimization in image watermarking by using genetic algorithms , 2005, IEEE Transactions on Signal Processing.

[5]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[6]  Chih-Chin Lai,et al.  A Novel Clustering Approach using Hierarchical Genetic Algorithms , 2005, Intell. Autom. Soft Comput..

[7]  Luis P. B. Scott,et al.  Using genetic algorithm to design protein sequence , 2008, Appl. Math. Comput..

[8]  Arputharaj Kannan,et al.  A genetic-algorithm based neural network short-term forecasting framework for database intrusion prediction system , 2006, Soft Comput..

[9]  Chuan-Yu Chang,et al.  A hierarchical evolutionary algorithm for automatic medical image segmentation , 2009, Expert Syst. Appl..

[10]  Michael J. Laszlo,et al.  A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Wun-Hwa Chen,et al.  A hybrid heuristic to solve a task allocation problem , 2000, Comput. Oper. Res..

[12]  Morteza Haghir Chehreghani,et al.  Novel meta-heuristic algorithms for clustering web documents , 2008, Appl. Math. Comput..

[13]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Bassem Jarboui,et al.  Combinatorial particle swarm optimization (CPSO) for partitional clustering problem , 2007, Appl. Math. Comput..

[15]  Xianda Zhang,et al.  A robust dynamic niching genetic algorithm with niche migration for automatic clustering problem , 2010, Pattern Recognit..

[16]  S. Bandyopadhyay,et al.  Nonparametric genetic clustering: comparison of validity indices , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[17]  Zülal Güngör,et al.  K-harmonic means data clustering with simulated annealing heuristic , 2007, Appl. Math. Comput..

[18]  Yan Liu,et al.  A Hybrid Tabu Search Based Clustering Algorithm , 2005, KES.

[19]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[20]  Lin-Yu Tseng,et al.  A genetic approach to the automatic clustering problem , 2001, Pattern Recognit..

[21]  Chungnan Lee,et al.  On the harmonious mating strategy through tabu search , 2003, Inf. Sci..

[22]  Hwei-Jen Lin,et al.  An Efficient GA-based Clustering Technique , 2005 .

[23]  Irène Charon,et al.  Noising methods for a clique partitioning problem , 2006, Discret. Appl. Math..

[24]  B. Kulkarni,et al.  An ant colony approach for clustering , 2004 .

[25]  Xue-Ming Li,et al.  A hybrid genetic based clustering algorithm , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[26]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ujjwal Maulik,et al.  An evolutionary technique based on K-Means algorithm for optimal clustering in RN , 2002, Inf. Sci..

[28]  Ali Maroosi,et al.  Application of honey-bee mating optimization algorithm on clustering , 2007, Appl. Math. Comput..

[29]  Yan Liu,et al.  Clustering with Noising Method , 2005, ADMA.

[30]  Witold Pedrycz,et al.  Knowledge-Based Clustering , 2005 .

[31]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[32]  Michael K. Ng,et al.  Clustering categorical data sets using tabu search techniques , 2002, Pattern Recognit..

[33]  Shyi-Ming Chen,et al.  A new query reweighting method for document retrieval based on genetic algorithms , 2006, IEEE Transactions on Evolutionary Computation.

[34]  Ieee Machine,et al.  A New Line Symmetry Distance and Its Application to Data Clustering , 2009 .

[35]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[36]  Irène Charon,et al.  Application of the noising method to the travelling salesman problem , 2000, Eur. J. Oper. Res..

[37]  Joaquín A. Pacheco,et al.  A scatter search approach for the minimum sum-of-squares clustering problem , 2005, Comput. Oper. Res..

[38]  Irène Charon,et al.  The noising method: a new method for combinatorial optimization , 1993, Oper. Res. Lett..

[39]  SANGHAMITRA BANDYOPADHYAY,et al.  Clustering Using Simulated Annealing with Probabilistic Redistribution , 2001, Int. J. Pattern Recognit. Artif. Intell..