Clustering Microarray Data by Using a Stochastic Algorithm

The clustering of gene expression data is used to analyze the results of microarray studies. This method is often useful in understanding how a particular class of genes functions together during a biological process. In this study, we attempted to perform clustering using the Markov cluster (MCL) algorithm, a clustering method for graphs based on the simulation of stochastic flow. It is a fast and efficient algorithm that clusters nodes in a graph through simulation by computing probability. First, we converted the raw matrix into a sample matrix using the Euclidean distance of the genes between the samples. Second, we applied the MCL algorithm to the new matrix of Euclidean distance and considered 2 factors, namely, the inflation and diagonal terms of the matrix. We have turned to set the proper factors through massive experiments. In addition, distance thresholds, i.e., the average of each column data elements, were used to clearly distinguish between groups. Our experimental result shows about 70% accuracy in average compared to the class that is known before. We also compared the MCL algorithm with the self-organizing map (SOM) clustering, K-means clustering and hierarchical clustering (HC) algorithms.

[1]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[4]  Keun Ho Ryu,et al.  Clustering Approach using MCL Algorithm 1 for Analyzing Microarray Data , 2007 .

[5]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[6]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[7]  Ujjwal Maulik,et al.  An improved algorithm for clustering gene expression data , 2007, Bioinform..

[8]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[9]  Ron Shamir,et al.  An algorithm for clustering cDNAs for gene expression analysis , 1999, RECOMB.

[10]  Sunshin Kim,et al.  Clustering Methods for Finding Orthologs among Multiple Species , 2007 .

[11]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[13]  N. Lee,et al.  A concise guide to cDNA microarray analysis. , 2000, BioTechniques.

[14]  Anbupalam Thalamuthu,et al.  Gene expression Evaluation and comparison of gene clustering methods in microarray analysis , 2006 .