Clustering of Gene Expression Data: Performance and Similarity Analysis

Recent advances of the DNA microarray technology allow monitoring gene expression profiles of thousands of genes simultaneously. However, the analysis and handling of such fast growing data is becoming the major bottleneck in the utilization of the technology. Clustering analysis is one of the most effective methods for analyzing such gene expression data. In this paper we first experimentally study three major clustering algorithms: hierarchical clustering, self-organizing map (SOM), and self organizing tree algorithm (SOTA), using yeast saccharomyces cerevisiae gene expression data, and compare their performance. Then, we present a data mining tool, cluster diff, which allows the similarity analysis of clusters generated by different algorithms. A case study is conducted based on clusters generated by SOTA and SOM

[1]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[2]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[3]  Sanjay Ranka,et al.  An effic ient k-means clustering algorithm , 1997 .

[4]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[5]  A. Brazma,et al.  Gene expression data analysis , 2000, FEBS letters.

[6]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[7]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Joaquín Dopazo,et al.  Combining hierarchical clustering and self-organizing maps for exploratory analysis of gene expression patterns. , 2002, Journal of proteome research.

[9]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[12]  J. Dopazo,et al.  Methods and approaches in the analysis of gene expression data. , 2001, Journal of immunological methods.

[13]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[15]  Brian Everitt,et al.  Cluster analysis , 1974 .

[16]  Mark Schena,et al.  Trends in microarray analysis , 2003, Nature Medicine.

[17]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[18]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[19]  J. V. Ryzin,et al.  Clustering Algorithms@@@Cluster Analysis Algorithms@@@Classification and Clustering , 1981 .

[20]  J. Dopazo,et al.  Phylogenetic Reconstruction Using an Unsupervised Growing Neural Network That Adopts the Topology of a Phylogenetic Tree , 1997, Journal of Molecular Evolution.

[21]  Joaquín Dopazo,et al.  Microarray Data Processing and Analysis , 2002 .

[22]  Joaquín Dopazo,et al.  Bioinformatics methods for the analysis of expression arrays: data clustering and information extraction. , 2002, Journal of biotechnology.

[23]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[24]  David M. Mount,et al.  The analysis of a simple k-means clustering algorithm , 2000, SCG '00.

[25]  G. Clark,et al.  Reference , 2008 .

[26]  L. Infante,et al.  Hierarchical Clustering , 2020, International Encyclopedia of Statistical Science.

[27]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[28]  Krista Rizman Zalik,et al.  An efficient k 0-means clustering algorithm , 2008 .