A Genetic K-means Clustering Algorithm Applied to Gene Expression Data

One of the current main strategies to understand a biological process at genome level is to cluster genes by their expression data obtained from DNA microarray experiments. The classic K-means clustering algorithm is a deterministic search and may terminate in a locally optimal clustering. In this paper, a genetic K-means clustering algorithm, called GKMCA, for clustering in gene expression datasets is described. GKMCA is a hybridization of a genetic algorithm (GA) and the iterative optimal K-means algorithm (IOKMA). In GKMCA, each individual is encoded by a partition table which uniquely determines a clustering, and three genetic operators (selection, crossover, mutation) and an IOKM operator derived from IOKMA are employed. The superiority of the GKMCA over the IOKMA and over other GA-clustering algorithms without the IOKM operator is demonstrated for two real gene expression datasets.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Michael Q. Zhang,et al.  Evaluation and comparison of clustering algorithms in analyzing es cell gene expression data , 2002 .

[3]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Olli Nevalainen,et al.  Genetic Algorithms for Large-Scale Clustering Problems , 1997, Comput. J..

[6]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[7]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[8]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[9]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[10]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[11]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[12]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  P. Brown,et al.  DNA arrays for analysis of gene expression. , 1999, Methods in enzymology.