Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation.

Array technologies have made it straightforward to monitor simultaneously the expression pattern of thousands of genes. The challenge now is to interpret such massive data sets. The first step is to extract the fundamental patterns of gene expression inherent in the data. This paper describes the application of self-organizing maps, a type of mathematical cluster analysis that is particularly well suited for recognizing and classifying features in complex, multidimensional data. The method has been implemented in a publicly available computer package, GENECLUSTER, that performs the analytical calculations and provides easy data visualization. To illustrate the value of such analysis, the approach is applied to hematopoietic differentiation in four well studied models (HL-60, U937, Jurkat, and NB4 cells). Expression patterns of some 6,000 human genes were assayed, and an online database was created. GENECLUSTER was used to organize the genes into biologically relevant clusters that suggest novel hypotheses about hematopoietic differentiation-for example, highlighting certain genes and pathways involved in "differentiation therapy" used in the treatment of acute promyelocytic leukemia.

[1]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[2]  A. D. Gordon,et al.  Classification : Methods for the Exploratory Analysis of Multivariate Data , 1981 .

[3]  Christine Chomienne,et al.  The PML-RARα fusion mRNA generated by the t(15;17) translocation in acute promyelocytic leukemia encodes a functionally altered RAR , 1991, Cell.

[4]  K. Umesono,et al.  Chromosomal translocation t(15;17) in human acute promyelocytic leukemia fuses RARα with a novel putative transcription factor, PML , 1991, Cell.

[5]  D. Forsdyke,et al.  A human putative lymphocyte G0/G1 switch gene containing a CpG-rich island encodes a small basic protein with the potential to be phosphorylated. , 1991, DNA and cell biology.

[6]  B. Bierer,et al.  Molecular cloning of a 25-kDa high affinity rapamycin binding protein, FKBP25. , 1992, The Journal of biological chemistry.

[7]  J. D. Jobson,et al.  Categorical and multivariate methods , 1992 .

[8]  Robert F. Ling,et al.  Applied Multivariate Data Analysis, Vol. I: Regression and Experimental Design (J. D. Jobson) , 1992, SIAM Rev..

[9]  J. Trowsdale,et al.  DNA sequence analysis of 66 kb of the human MHC class II region encoding a cluster of genes for antigen processing. , 1992, Journal of molecular biology.

[10]  AC Tose Cell , 1993, Cell.

[11]  P. Terpstra,et al.  A gene in the chromosomal region 3p21 with greatly reduced expression in lung cancer is similar to the gene for ubiquitin-activating enzyme. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[12]  J N Weinstein,et al.  Use of the Kohonen self-organizing map to study the mechanisms of action of chemotherapeutic agents. , 1994, Journal of the National Cancer Institute.

[13]  B. Morgan,et al.  Non-uniqueness and Inversions in Cluster Analysis , 1995 .

[14]  David West,et al.  A comparison of SOM neural network and hierarchical clustering methods , 1996 .

[15]  S. Ōmura,et al.  Accelerated degradation of PML-retinoic acid receptor alpha (PML-RARA) oncoprotein by all-trans-retinoic acid in acute promyelocytic leukemia: possible role of the proteasome pathway. , 1996, Cancer Research.

[16]  L. Wodicka,et al.  Genome-wide expression monitoring in Saccharomyces cerevisiae , 1997, Nature Biotechnology.

[17]  S. Jentsch,et al.  GrpE‐like regulation of the Hsc70 chaperone by the anti‐apoptotic protein BAG‐1 , 1997, The EMBO journal.

[18]  Y. Miyata,et al.  Phosphorylation of the immunosuppressant FK506-binding protein FKBP52 by casein kinase II: regulation of HSP90-binding activity of FKBP52. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[19]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[20]  E. Dmitrovsky,et al.  Common defects of different retinoic acid resistant promyelocytic leukemia cells are persistent telomerase activity and nuclear body disorganization. , 1997, Differentiation; research in biological diversity.

[21]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[22]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[23]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[24]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[25]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[26]  D. Levy,et al.  Proto-oncogene PML controls genes devoted to MHC class I antigen presentation , 1998, Nature.