Adaptive Double Self-Organizing Map for Clustering Gene Expression Data

In presenting this thesis in partial hlfillment of the requirements for an advanced degree at The University of Maine, I agree that the Library shall make it freely available for inspection. I further agree that the Librarian may grant permission for "fair use" copying of this thesis for scholarly purposes. It is understood that any copying or publication of this thesis for financial gain shall not be allowed without my written permission. This thesis presents a novel clustering technique known as adaptive double self-organizing map (ADSOM) that addresses the issue of identifying the "correct" number of clusters. ADSOM has a flexible topology and performs clustering and cluster visualization simultaneously, thereby requiring no a priori knowledge about the number of clusters. ADSOM combines features of the popular self-organizing map with two-dimensional position vectors, which serve as a visualization tool to decide the number of clusters. It updates its free parameters during training and it allows convergence of its position vectors to a fairly consistent number of clusters provided that its initial number of nodes is greater than the expected number of clusters. A novel index is introduced based on hierarchical clustering of the final locations of position vectors. The index allows automated detection of the number of clusters, thereby reducing human error that could be incurred from counting clusters visually. The reliance of ADSOM in identifjmg the number of clusters is proven by applying it to publicly available gene expression data from multiple biological systems such as yeast, human, mouse, and bacteria. providing me with the opportunity to pursue my Master's degree at University of Maine. I would like to thank them for all their time, encouragement and guidance during my over two-year graduate study. I would also like to thank Dr. Cristian Dornnisoru for his kindly assistance in my graduate research work as well as in many other things at the University of Maine. I wish to thank graduate coordinator Dr. all of their time and assistance with the various courses I have taken. Thanks also go to Ms. Padma Natarajan for her care and encouragement. I want to thank all the other faculty members in the department and other lab members in the Intelligent Systems Laboratory who have given me help during my graduate study at University of Maine. I would like to thank Dr. Su of the National Central University in Taiwan for his initial idea about double …

[1]  Taizo Hanai,et al.  Gene Expression Analysis Using Fuzzy ART , 2001 .

[2]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[4]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[5]  Samuel Kaski,et al.  Methods for Exploratory Cluster Analysis , 2003, Intelligent Exploration of the Web.

[6]  Samuel Kaski,et al.  Methods for interpreting a self-organized map in data analysis , 1998, ESANN.

[7]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[8]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[9]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  A. Schuster,et al.  Tumor classification by gene expression profiling: comparison and validation of five clustering methods , 2001, SIGB.

[11]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[13]  David West,et al.  A comparison of SOM neural network and hierarchical clustering methods , 1996 .

[14]  Kathleen Marchal,et al.  Adaptive quality-based clustering of gene expression profiles , 2002, Bioinform..

[15]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[16]  M. Eisen,et al.  Gene expression informatics —it's all in your mine , 1999, Nature Genetics.

[17]  Joaquín Dopazo,et al.  Combining hierarchical clustering and self-organizing maps for exploratory analysis of gene expression patterns. , 2002, Journal of proteome research.

[18]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[19]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[20]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[21]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[22]  Jarkko Venna,et al.  Analysis and visualization of gene expression data using Self-Organizing Maps , 2002, Neural Networks.

[23]  Reinhard Guthke,et al.  Gene Expression Data Mining for Functional Genomics , 2001 .

[24]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[25]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[26]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[27]  Ron Shamir,et al.  An algorithm for clustering cDNAs for gene expression analysis , 1999, RECOMB.

[28]  Hans-Werner Mewes,et al.  MIPS: a database for protein sequences, homology data and yeast genome information , 1997, Nucleic Acids Res..

[29]  Mu-Chun Su,et al.  A new model of self-organizing neural networks and its application in data projection , 2001, IEEE Trans. Neural Networks.

[30]  Hao Wu,et al.  MAANOVA: A Software Package for the Analysis of Spotted cDNA Microarray Experiments , 2003 .

[31]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[32]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[33]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[34]  Samuel Kaski,et al.  SOM-Based Exploratory Analysis of Gene Expression Data , 2001, WSOM.

[35]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[36]  Mu-Chun Su,et al.  An efficient initialization scheme for the self-organizing feature map algorithm , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).