A niching genetic k-means algorithm and its applications to gene expression data

Partitional clustering is a common approach to cluster analysis. Although many algorithms have been proposed, partitional clustering remains a challenging problem with respect to the reliability and efficiency of recovering high quality solutions in terms of its criterion functions. In this paper, we propose a niching genetic k-means algorithm (NGKA) for partitional clustering, which aims at reliably and efficiently identifying high quality solutions in terms of the sum of squared errors criterion. Within the NGKA, we design a niching method, which encourages mating among similar clustering solutions while allowing for some competitions among dissimilar solutions, and integrate it into a genetic algorithm to prevent premature convergence during the evolutionary clustering search. Further, we incorporate one step of k-means operation into the regeneration steps of the resulted niching genetic algorithm to improve its computational efficiency. The proposed algorithm was applied to cluster both simulated data and gene expression data and compared with previous work. Experimental results clear show that the NGKA is an effective clustering algorithm and outperforms two other genetic algorithm based clustering methods implemented for comparison.

[1]  Michael L. Bittner,et al.  Clustering analysis for gene expression data , 1999, Photonics West - Biomedical Optics.

[2]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[3]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[4]  Furong Li,et al.  Hybrid genetic approaches to ramping rate constrained dynamic economic dispatch , 1997 .

[5]  Rita Cucchiara,et al.  Genetic algorithms for clustering in machine vision , 1998, Machine Vision and Applications.

[6]  Ralph R. Martin,et al.  A Sequential Niche Technique for Multimodal Function Optimization , 1993, Evolutionary Computation.

[7]  David E. Goldberg,et al.  Genetic Algorithms, Clustering, and the Breaking of Symmetry , 2000, PPSN.

[8]  Jacques Periaux,et al.  Genetic Algorithms in Engineering and Computer Science , 1996 .

[9]  K. Leung,et al.  Genetic-guided Model-based Clustering Algorithms ∗ , 2006 .

[10]  David E. Goldberg,et al.  Genetic Algorithms with Sharing for Multimodalfunction Optimization , 1987, ICGA.

[11]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[12]  D. Fogel Evolutionary algorithms in theory and practice , 1997, Complex..

[13]  A. Nicolas,et al.  Efficient genetic algorithms for solving hard constrained optimization problems , 2000 .

[14]  Samir W. Mahfoud Niching methods for genetic algorithms , 1996 .

[15]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[16]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[17]  C. A. Murthy,et al.  In search of optimal clusters using genetic algorithms , 1996, Pattern Recognit. Lett..

[18]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[19]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[20]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[21]  Manish Sarkar,et al.  A clustering algorithm using an evolutionary programming-based approach , 1997, Pattern Recognit. Lett..

[22]  Pedro Larrañaga,et al.  Applying genetic algorithms to search for the best hierarchical clustering of a dataset , 1999, Pattern Recognit. Lett..

[23]  Mark H. Karwan,et al.  Multicriteria dynamic programming with an application to the integer case , 1982 .

[24]  M. Narasimha Murty,et al.  Clustering with evolution strategies , 1994, Pattern Recognit..

[25]  Richard C. Dubes,et al.  Experiments in projection and clustering by simulated annealing , 1989, Pattern Recognit..

[26]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[27]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[28]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[29]  S. Safavi-Naeini,et al.  A global optimization algorithm based on combined evolutionary programming/cluster analysis , 2003, CCECE 2003 - Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology (Cat. No.03CH37436).

[30]  K. Dejong,et al.  An analysis of the behavior of a class of genetic adaptive systems , 1975 .

[31]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[32]  Zhen Yang,et al.  Effective Memetic Algorithms for VLSI Design = Genetic Algorithms Local Search Multi-Level Clustering , 2004, Evolutionary Computation.

[33]  Zbigniew Michalewicz,et al.  Genetic algorithms + data structures = evolution programs (3rd ed.) , 1996 .

[34]  Roded Sharan,et al.  CLICK: A Clustering Algorithm for Gene Expression Analysis , 2000, ISMB 2000.

[35]  Keinosuke Fukunaga,et al.  A Branch and Bound Clustering Algorithm , 1975, IEEE Transactions on Computers.

[36]  David G. Stork,et al.  Pattern Classification , 1973 .

[37]  D. Kodek Design of optimal finite wordlength FIR digital filters using integer programming techniques , 1980 .

[38]  Louis A. Tamburino,et al.  Generating Pattern- Recognition Systems Using Evolutionary Learning , 1995, IEEE Expert.

[39]  Alain Pétrowski,et al.  A clearing procedure as a niching method for genetic algorithms , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[40]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[41]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[42]  Jürgen Branke,et al.  Improved heuristics and a genetic algorithm for finding short supersequences , 1998 .

[43]  Hong Yan,et al.  Cluster analysis of gene expression data based on self-splitting and merging competitive learning , 2004, IEEE Transactions on Information Technology in Biomedicine.

[44]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[45]  Darrell Whitley,et al.  Modeling Hybrid Genetic Algorithms , 1995 .

[46]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[47]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[48]  Bruno Sareni,et al.  Fitness sharing and niching methods revisited , 1998, IEEE Trans. Evol. Comput..

[49]  Donald E. Brown,et al.  A practical application of simulated annealing to clustering , 1990, Pattern Recognit..

[50]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.