A simulated annealing approach to find the optimal parameters for fuzzy clustering microarray data

Rapid advances of microarray technologies are making it possible to analyze and manipulate large amounts of gene expression data. Clustering algorithms, such as hierarchical clustering, self-organizing maps, k-means and fuzzy k-means, have become important tools for expression analysis of microarray data. However, the need of prior knowledge of the number of clusters, k, and the fuzziness parameter, b, limits the usage of fuzzy clustering. Few approaches have been proposed for assigning the best possible values for such parameters. In this paper, we use simulated annealing and fuzzy k-means clustering to determine the optimal parameters, namely the number of clusters, k, and the fuzziness parameter, b. Our results show that a nearly-optimal pair of k and b can be obtained without exploring the entire search space.

[1]  Nikola Kasabov,et al.  Fuzzy clustering of gene expression data , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[2]  Siddheswar Ray,et al.  Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation , 2000 .

[3]  R. Young,et al.  Biomedical Discovery with DNA Arrays , 2000, Cell.

[4]  Wei. Yang Optimizing parameters in fuzzy k-means for clustering microarray data. , 2005 .

[5]  Ujjwal Maulik,et al.  Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification , 2003, IEEE Trans. Geosci. Remote. Sens..

[6]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[7]  Rainer Fuchs,et al.  Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters , 2001, Bioinform..

[8]  MaulikUjjwal,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002 .

[9]  Sorin Drăghici,et al.  Data Analysis Tools for DNA Microarrays , 2003 .

[10]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[11]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[12]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[13]  E. Winzeler,et al.  Genomics, gene expression and DNA arrays , 2000, Nature.

[14]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[15]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[16]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[18]  Geoffrey C. Fox,et al.  A Comparison of Annealing Techniques for Academic Course Scheduling , 1997, PATAT.

[19]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[20]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[21]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Jonathan M. Garibaldi,et al.  The Application of a Simulated Annealing Fuzzy Clustering Algorithm for Cancer Diagnosis , 2004 .

[23]  Karl Heinz Hoffmann,et al.  The optimal simulated annealing schedule for a simple model , 1990 .

[24]  I. King,et al.  Gaussian mixture distance for information retrieval , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).