Self-organizing maps in mining gene expression data

Abstract Modern DNA microarray technology provides means of measuring gene expression patterns of the whole genome of simple organisms at once. Exploratory analysis of these large-scale expression datasets is becoming vital to extracting functional information from the measurements. We demonstrate how self-organizing maps (SOM) can be applied to exploratory analysis of gene expression data from a yeast DNA microarray database in order to very rapidly find gene families with similar expression patterns. SOM not only enabled quickly selecting the gene families identified in previous work, but it facilitated identifying additional genes with similar expression patterns. Identifying new families of genes also appears to be possible as demonstrated by additional clusters of genes discovered from the data. Moreover, further insight into the primary pattern variations that discriminate between the families became explicit.

[1]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[2]  T. Kohonen,et al.  Exploratory Data Analysis by the Self-Organizing Map: Structures of Welfare and Poverty in the World , 1996 .

[3]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Pasi Koikkalainen,et al.  Progress with the Tree-Structured Self-Organizing Map , 1994, ECAI.

[5]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[6]  Otto Opitz,et al.  Information and Classification , 1993 .

[7]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[8]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[9]  M. Q. Zhang Large-scale gene expression data analysis: a new challenge to computational biologists. , 1999, Genome research.

[10]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[11]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[12]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[13]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[14]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[15]  J. Friedman Exploratory Projection Pursuit , 1987 .

[16]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.