Analyzing the Escherichia coli gene expression data by a multilayer adjusted tree organizing map

Using the DNA microarray technology, biologists have thousands of array data available. Discovering the function relations between genes and their involvements in biological processes depends on the ability to efficiently process and quantitatively analyze large amounts of array data. Clustering algorithms are among the popular tools that can be used to help biologists achieve their goals. Although some existing research projects employed clustering algorithms on biological data, none of them has examined the Escherichia coli (E. coli) gene expression data. This paper proposes a clustering algorithm called Multilayer Adjusted Tree Organizing Map (MA TOM) to analyze the E. coli gene expression data. In a semi-supervised manner, MATOM constructs a multilayer map, and at the same time, removes noise data in the previously trained maps in order to improve the training process. This paper then presents the clustering results produced by MATOM and other existing clustering algorithms using the E. coli gene expression data, and a new evaluation method to assess them. The results show that MATOM performs the best in terms of percentage of genes that are clustered correctly.

[1]  G S Michaels,et al.  Cluster analysis and data visualization of large-scale gene expression data. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[2]  T. Conway,et al.  Gene Expression Profiling of the pH Response in Escherichia coli , 2002, Journal of bacteriology.

[3]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[4]  James C. Bezdek,et al.  An integrated approach to fuzzy learning vector quantization and fuzzy c-means clustering , 1997, IEEE Trans. Fuzzy Syst..

[5]  A. Watson,et al.  Technology for microarray analysis of gene expression. , 1998, Current opinion in biotechnology.

[6]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[7]  Escherichia coli Response in Gene Expression Profiling of the pH , 2002 .

[8]  Sanjay Ranka,et al.  An effic ient k-means clustering algorithm , 1997 .

[9]  A. Brazma,et al.  Gene expression data analysis , 2000, FEBS letters.

[10]  C. Rosenow,et al.  Monitoring gene expression using DNA microarrays. , 2000, Current opinion in microbiology.

[11]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[12]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[14]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[15]  Palma Blonda,et al.  A survey of fuzzy clustering algorithms for pattern recognition. I , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[16]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .