A genetic k-medoids clustering algorithm

We propose a hybrid genetic algorithm for k-medoids clustering. A novel heuristic operator is designed and integrated with the genetic algorithm to fine-tune the search. Further, variable length individuals that encode different number of medoids (clusters) are used for evolution with a modified Davies-Bouldin index as a measure of the fitness of the corresponding partitionings. As a result the proposed algorithm can efficiently evolve appropriate partitionings while making no a priori assumption about the number of clusters present in the datasets. In the experiments, we show the effectiveness of the proposed algorithm and compare it with other related clustering methods.

[1]  Paul Scheunders,et al.  A genetic c-Means clustering algorithm applied to color image quantization , 1997, Pattern Recognit..

[2]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  R HruschkaEduardo,et al.  A genetic algorithm for cluster analysis , 2003 .

[5]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[6]  Nelson F. F. Ebecken,et al.  A genetic algorithm for cluster analysis , 2003, Intell. Data Anal..

[7]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  David E. Goldberg,et al.  Genetic Algorithms with Sharing for Multimodalfunction Optimization , 1987, ICGA.

[9]  Philip K. Hopke,et al.  The use of sampling to cluster large data sets , 1990 .

[10]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[11]  Ujjwal Maulik,et al.  An evolutionary technique based on K-Means algorithm for optimal clustering in RN , 2002, Inf. Sci..

[12]  C. A. Murthy,et al.  In search of optimal clusters using genetic algorithms , 1996, Pattern Recognit. Lett..

[13]  Rita Cucchiara,et al.  Genetic algorithms for clustering in machine vision , 1998, Machine Vision and Applications.

[14]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[15]  J. C. W. Debuse,et al.  An Effective Genetic Algorithm for the Fixed Channel Assignment Problem , 2001 .

[16]  Yi Lu,et al.  Incremental genetic K-means algorithm and its application in gene expression data analysis , 2004, BMC Bioinformatics.

[17]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[18]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[19]  C. B. Lucasius,et al.  On k-medoid clustering of large data sets with the aid of a genetic algorithm: background, feasiblity and comparison , 1993 .

[20]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[21]  Zhen Yang,et al.  Effective Memetic Algorithms for VLSI Design = Genetic Algorithms Local Search Multi-Level Clustering , 2004, Evolutionary Computation.

[22]  Ricardo J. G. B. Campello,et al.  Evolutionary algorithms for clustering gene-expression data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[23]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[24]  David G. Stork,et al.  Pattern Classification , 1973 .

[25]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[26]  Hong Yan,et al.  Cluster analysis of gene expression data based on self-splitting and merging competitive learning , 2004, IEEE Transactions on Information Technology in Biomedicine.

[27]  G. N. Lance,et al.  A general theory of classificatory sorting strategies: II. Clustering systems , 1967, Comput. J..

[28]  R. Plackett,et al.  THE DESIGN OF OPTIMUM MULTIFACTORIAL EXPERIMENTS , 1946 .

[29]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[30]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[31]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[32]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[33]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[34]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[35]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[36]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[37]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[38]  Micha Sharir,et al.  The Discrete 2-Center Problem , 1997, SCG '97.