A hybrid stochastic genetic–GRASP algorithm for clustering analysis

This paper presents a new stochastic methodology, which is based on the concepts of genetic algorithms (GAs) and greedy randomized adaptive search procedure (GRASP), for optimally clustering N objects into K clusters. The proposed stochastic algorithm (Hybrid GEN–GRASP) for the solution of the clustering problem is a two phase algorithm which combines a genetic algorithm for the solution of the feature selection problem and a GRASP algorithm for the solution of the clustering problem. Due to the nature of stochastic and population-based search, the proposed algorithm can overcome the drawbacks of traditional clustering methods. Its performance is compared with another methodology that uses for the solution of the feature selection problem a very popular metaheuristic method, the Tabu Search algorithm. Results from the application of the methodology to data sets from the UCI Machine Learning Repository are presented.

[1]  Magdalene Marinaki,et al.  Optimization of nearest neighbor classifiers via metaheuristic algorithms for credit risk assessment , 2008, J. Glob. Optim..

[2]  Wenbo Xu,et al.  Quantum-Behaved Particle Swarm Optimization Clustering Algorithm , 2006, ADMA.

[3]  Gilles Venturini,et al.  A hierarchical ant based clustering algorithm and its use in three real-world applications , 2007, Eur. J. Oper. Res..

[4]  Shing I. Chang,et al.  Determination of cluster number in clustering microarray data , 2005, Appl. Math. Comput..

[5]  Celso C. Ribeiro,et al.  Greedy Randomized Adaptive Search Procedures , 2003, Handbook of Metaheuristics.

[6]  Erwie Zahara,et al.  A hybridized approach to data clustering , 2008, Expert Syst. Appl..

[7]  David W. Aha,et al.  A Comparative Evaluation of Sequential Feature Selection Algorithms , 1995, AISTATS.

[8]  Chandrika Kamath,et al.  Feature selection in scientific applications , 2004, KDD.

[9]  Gilles Venturini,et al.  Data and Text Mining with Hierarchical Clustering Ants , 2006, Swarm Intelligence in Data Mining.

[10]  Jinn-Yi Yeh,et al.  A hierarchical genetic algorithm for segmentation of multi-spectral human-brain MRI , 2008, Expert Syst. Appl..

[11]  Weiguo Sheng,et al.  A genetic k-medoids clustering algorithm , 2006, J. Heuristics.

[12]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[13]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[14]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[15]  Mohamed S. Kamel,et al.  An aggregated clustering approach using multi-ant colonies algorithms , 2006, Pattern Recognit..

[16]  Kevin Cheng,et al.  An ACO-Based Clustering Algorithm , 2006, ANTS Workshop.

[17]  Wei Zhang,et al.  A genetic clustering method for intrusion detection , 2004, Pattern Recognit..

[18]  Shokri Z. Selim,et al.  A simulated annealing algorithm for the clustering problem , 1991, Pattern Recognit..

[19]  T Watson Layne,et al.  A Genetic Algorithm Approach to Cluster Analysis , 1998 .

[20]  Lin-Yu Tseng,et al.  A genetic approach to the automatic clustering problem , 2001, Pattern Recognit..

[21]  Reda Younsi,et al.  A New Artificial Immune System Algorithm for Clustering , 2004, IDEAL.

[22]  Leandro Nunes de Castro,et al.  TermitAnt: An Ant Clustering Algorithm Improved by Ideas from Termite Colonies , 2004, ICONIP.

[23]  Fang-Xiang Wu,et al.  A Genetic K-means Clustering Algorithm Applied to Gene Expression Data , 2003, Canadian Conference on AI.

[24]  Daniel Merkle,et al.  A New Multi-objective Particle Swarm Optimization Algorithm Using Clustering Applied to Automated Docking , 2005, Hybrid Metaheuristics.

[25]  Hongzhou Tan,et al.  A Combinational Clustering Method Based on Artificial Immune System and Support Vector Machine , 2006, KES.

[26]  E. Ziegel,et al.  Artificial intelligence and statistics , 1986 .

[27]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Agostino Tarsitano,et al.  A computational study of several relocation methods for k-means algorithms , 2003, Pattern Recognit..

[29]  Hong-yuan Shen,et al.  A Mountain Clustering Based on Improved PSO Algorithm , 2005, ICNC.

[30]  Colin Reeves Genetic Algorithms , 2003, Handbook of Metaheuristics.

[31]  Panos M. Pardalos,et al.  Expanding Neighborhood GRASP for the Traveling Salesman Problem , 2005, Comput. Optim. Appl..

[32]  Ajith Abraham,et al.  Swarm Intelligence in Data Mining , 2009, Swarm Intelligence in Data Mining.

[33]  S. C. Chu,et al.  A Clustering Algorithm Using The Tabu SearchApproach With Simulated Annealing , 2000 .

[34]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[35]  Siddheswar Ray,et al.  Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation , 2000 .

[36]  Sam Kwong,et al.  Ant Colony Clustering and Feature Extraction for Anomaly Intrusion Detection , 2006, Swarm Intelligence in Data Mining.

[37]  Francisco Herrera,et al.  A GRASP Algorithm for Clustering , 2002, IBERAMIA.

[38]  Siu Cheung Hui,et al.  A Novel Ant-Based Clustering Approach for Document Clustering , 2006, AIRS.

[39]  Michael K. Ng,et al.  A note on constrained k-means algorithms , 2000, Pattern Recognit..

[40]  C. Reeves Modern heuristic techniques for combinatorial problems , 1993 .

[41]  Lin Yu Tseng,et al.  A genetic clustering algorithm for data with non-spherical-shape clusters , 2000, Pattern Recognit..

[42]  Sandra Paterlini,et al.  Differential evolution and particle swarm optimisation in partitional clustering , 2006, Comput. Stat. Data Anal..

[43]  Panos M. Pardalos,et al.  A Hybrid Genetic—GRASP Algorithm Using Lagrangean Relaxation for the Traveling Salesman Problem , 2005, J. Comb. Optim..

[44]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[45]  Chang Sup Sung,et al.  A tabu-search-based heuristic for clustering , 2000, Pattern Recognit..

[46]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[47]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[48]  B. Kulkarni,et al.  An ant colony approach for clustering , 2004 .

[49]  Fabio A. González,et al.  A Scalable Artificial Immune System Model for Dynamic Unsupervised Learning , 2003, GECCO.

[50]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[51]  M. Narasimha Murty,et al.  A near-optimal initial seed value selection in K-means means algorithm using a genetic algorithm , 1993, Pattern Recognit. Lett..

[52]  Yan Liu,et al.  A Hybrid Tabu Search Based Clustering Algorithm , 2005, KES.

[53]  Mauricio G. C. Resende,et al.  Greedy Randomized Adaptive Search Procedures , 1995, J. Glob. Optim..

[54]  Donald E. Brown,et al.  A practical application of simulated annealing to clustering , 1990, Pattern Recognit..

[55]  Khaled S. Al-Sultan,et al.  A Tabu search approach to the clustering problem , 1995, Pattern Recognit..

[56]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[57]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[58]  Shu-Hsien Liao,et al.  Artificial neural networks classification and clustering of methodologies and applications - literature analysis from 1995 to 2005 , 2007, Expert Syst. Appl..

[59]  Q. Henry Wu,et al.  A Faster Genetic Clustering Algorithm , 2000, EvoWorkshops.