Clustering, Assessment and Validation: an application to gene expression data

In this work a multi-step approach for clustering assessment, visualization and data validation is introduced. Three main approaches for data clustering are used and compared: K-means, self organizing maps and probabilistic principal surfaces. A model explorer approach with different similarity measures is used to obtain the best parameters of the methods. The approach is used to identify genes periodically expressed in tumors related to the human cell cycle. Finally, clusters are validated by using GO term information.

[1]  L. Milano,et al.  A multifrequency analysis of radio variability of blazars , 2004, astro-ph/0401501.

[2]  Antonino Staiano,et al.  A multi-step approach to time series analysis and gene expression clustering , 2006, Bioinform..

[3]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[4]  P. Törönen,et al.  Analysis of gene expression data using self‐organizing maps , 1999, FEBS letters.

[5]  David G. Stork,et al.  Pattern Classification , 1973 .

[6]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[7]  Taizo Hanai,et al.  Fuzzy Neural Network Applied to Gene Expression Profiling for Predicting the Prognosis of Diffuse Large B‐cell Lymphoma , 2002, Japanese journal of cancer research : Gann.

[8]  William A. Schmitt,et al.  Interactive exploration of microarray gene expression patterns in a reduced dimensional space. , 2002, Genome research.

[9]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[10]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[11]  Antonino Staiano,et al.  NEC: A Hierarchical Agglomerative Clustering Based on Fisher and Negentropy Information , 2005, WIRN/NAIS.

[12]  Byoung-Tak Zhang,et al.  Gene Expression Pattern Analysis via Latent Variable Models Coupled with Topographic Clustering , 2003 .

[13]  D. Altman,et al.  Multiple significance tests: the Bonferroni method , 1995, BMJ.

[14]  Jill P. Mesirov,et al.  Support Vector Machine Classification of Microarray Data , 2001 .

[15]  Joydeep Ghosh,et al.  A Unified Model for Probabilistic Principal Surfaces , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Angelo Ciaramella,et al.  Soft computing methodologies for spectral analysis in cyclostratigraphy , 2001 .

[18]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[19]  L. Milano,et al.  Spectral analysis of stellar light curves by means of neural networks , 1999 .

[20]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[21]  J. Collins,et al.  Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks , 2005, Nature Biotechnology.