Poisson-Based Self-Organizing Feature Maps and Hierarchical Clustering for Serial Analysis of Gene Expression Data

Serial analysis of gene expression (SAGE) is a powerful technique for global gene expression profiling, allowing simultaneous analysis of thousands of transcripts without prior structural and functional knowledge. Pattern discovery and visualization have become fundamental approaches to analyzing such large-scale gene expression data. From the pattern discovery perspective, clustering techniques have received great attention. However, due to the statistical nature of SAGE data {i.e., underlying distribution), traditional clustering techniques may not be suitable for SAGE data analysis. Based on the adaptation and improvement of self-organizing maps and hierarchical clustering techniques, this paper presents two new clustering algorithms, namely, PoissonS and PoissonHC, for SAGE data analysis. Tested on synthetic and experimental SAGE data, these algorithms demonstrate several advantages over traditional pattern discovery techniques. The results indicate that, by incorporating statistical properties of SAGE data, PoissonS and PoissonHC, as well as a hybrid approach (neuro-hierarchical approach) based on the combination of PoissonS and PoissonHC, offer significant improvements in pattern discovery and visualization for SAGE data. Moreover, a user-friendly platform, which may improve and accelerate SAGE data mining, was implemented. The system is freely available on request from the authors for nonprofit use

[1]  Jörg Sander,et al.  Hierarchical cluster analysis of SAGE data for cancer profiling , 2001, BIOKDD.

[2]  J. Dopazo,et al.  Phylogenetic Reconstruction Using an Unsupervised Growing Neural Network That Adopts the Topology of a Phylogenetic Tree , 1997, Journal of Molecular Evolution.

[3]  Laxmi Parida Pattern Discovery in Biomolecular Data: Tools, Techniques and Applications , 1999 .

[4]  Bala Srinivasan,et al.  Dynamic self-organizing maps with controlled growth for knowledge discovery , 2000, IEEE Trans. Neural Networks Learn. Syst..

[5]  J. Claverie,et al.  The significance of digital gene expression profiles. , 1997, Genome research.

[6]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  C. M. Aldaz,et al.  Serial Analysis of Gene Expression (SAGE) in Cancer Research , 2003 .

[8]  Tian-Li Wang,et al.  Identifying tumor origin using a gene expression-based classification map. , 2003, Cancer research.

[9]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[10]  GusfieldDan Introduction to the IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2004 .

[11]  J. Ruijter,et al.  Statistical evaluation of SAGE libraries: consequences for experimental design. , 2002, Physiological genomics.

[12]  J. Marks,et al.  A SAGE (serial analysis of gene expression) view of breast tumor progression. , 2001, Cancer research.

[13]  C. Becquet,et al.  Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data , 2002, Genome Biology.

[14]  Pragya Agarwal,et al.  Self-Organising Maps , 2008 .

[15]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Jun S. Liu,et al.  Clustering analysis of SAGE data using a Poisson approach , 2004, Genome Biology.

[17]  Teuvo Kohonen,et al.  In: Self-organising Maps , 1995 .

[18]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[19]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[20]  Seth Blackshaw,et al.  Comprehensive Analysis of Photoreceptor Gene Expression and the Identification of Candidate Retinal Disease Genes , 2001, Cell.

[21]  Joaquín Dopazo,et al.  Combining hierarchical clustering and self-organizing maps for exploratory analysis of gene expression patterns. , 2002, Journal of proteome research.

[22]  P. M. Hwang,et al.  Serial analysis of gene expression: technical considerations and applications to cardiovascular biology. , 2002, Circulation research.

[23]  K. Kinzler,et al.  Serial Analysis of Gene Expression , 1995, Science.

[24]  G. Riggins,et al.  Gene discovery using the serial analysis of gene expression technique: implications for cancer research. , 2001, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[25]  L. Ohno-Machado,et al.  Genomic Analysis of Mouse Retinal Development , 2004, PLoS biology.

[26]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[27]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[28]  Arthur Flexer,et al.  On the use of self-organizing maps for clustering and visualization , 1999, Intell. Data Anal..

[29]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[30]  F. J. Livesey,et al.  An analysis of the gene expression program of mammalian neural progenitor cells. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[31]  S. Altschul,et al.  SAGEmap: a public gene expression resource. , 2000, Genome research.

[32]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..