Expression cartography of human tissues using self organizing maps

BackgroundParallel high-throughput microarray and sequencing experiments produce vast quantities of multidimensional data which must be arranged and analyzed in a concerted way. One approach to addressing this challenge is the machine learning technique known as self organizing maps (SOMs). SOMs enable a parallel sample- and gene-centered view of genomic data combined with strong visualization and second-level analysis capabilities. The paper aims at bridging the gap between the potency of SOM-machine learning to reduce dimension of high-dimensional data on one hand and practical applications with special emphasis on gene expression analysis on the other hand.ResultsThe method was applied to generate a SOM characterizing the whole genome expression profiles of 67 healthy human tissues selected from ten tissue categories (adipose, endocrine, homeostasis, digestion, exocrine, epithelium, sexual reproduction, muscle, immune system and nervous tissues). SOM mapping reduces the dimension of expression data from ten of thousands of genes to a few thousand metagenes, each representing a minicluster of co-regulated single genes. Tissue-specific and common properties shared between groups of tissues emerge as a handful of localized spots in the tissue maps collecting groups of co-regulated and co-expressed metagenes. The functional context of the spots was discovered using overrepresentation analysis with respect to pre-defined gene sets of known functional impact. We found that tissue related spots typically contain enriched populations of genes related to specific molecular processes in the respective tissue. Analysis techniques normally used at the gene-level such as two-way hierarchical clustering are better represented and provide better signal-to-noise ratios if applied to the metagenes. Metagene-based clustering analyses aggregate the tissues broadly into three clusters containing nervous, immune system and the remaining tissues.ConclusionsThe SOM technique provides a more intuitive and informative global view of the behavior of a few well-defined modules of correlated and differentially expressed genes than the separate discovery of the expression levels of hundreds or thousands of individual genes. The program is available as R-package 'oposSOM'.

[1]  John Quackenbush Microarrays--Guilt by Association , 2003, Science.

[2]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[3]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  P. Törönen,et al.  Analysis of gene expression data using self‐organizing maps , 1999, FEBS letters.

[5]  BMC Bioinformatics , 2005 .

[6]  J. Castle,et al.  expression data: the tissue distribution of human pathways , 2006 .

[7]  Jarkko Venna,et al.  Analysis and visualization of gene expression data using Self-Organizing Maps , 2002, Neural Networks.

[8]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[10]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[11]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[12]  David Waxman,et al.  A Problem With the Correlation Coefficient as a Measure of Gene Expression Divergence , 2009, Genetics.

[13]  Donald E. Ingber,et al.  Towards a Holistic, Yet Gene-Centered Analysis of Gene Expression Profiles: A Case Study of Human Lung Cancers , 2006, Journal of biomedicine & biotechnology.

[14]  Hans Binder,et al.  Gene expression density profiles characterize modes of genomic regulation: theory and experiment. , 2010, Journal of biotechnology.

[15]  Koji Kadota,et al.  Ranking differentially expressed genes from Affymetrix gene expression data: methods with reproducibility, sensitivity, and specificity , 2008, Algorithms for Molecular Biology.

[16]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[17]  Jürgen Läuter,et al.  High‐dimensional data analysis: Selection of variables, data compression and graphics – Application to gene expression , 2009, Biometrical journal. Biometrische Zeitschrift.

[18]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.

[19]  Stephan Preibisch,et al.  "Hook"-calibration of GeneChip-microarrays: Chip characteristics and expression measures , 2008, Algorithms for Molecular Biology.

[20]  Koji Kadota,et al.  A weighted average difference method for detecting differentially expressed genes from microarray data , 2008, Algorithms for Molecular Biology.

[21]  Tian-Li Wang,et al.  Identifying tumor origin using a gene expression-based classification map. , 2003, Cancer research.

[22]  Hans Binder,et al.  Nonspecific hybridization scaling of microarray expression estimates: a physicochemical approach for chip-to-chip normalization. , 2009, The journal of physical chemistry. B.

[23]  M. Bucan,et al.  Promoter features related to tissue specificity as measured by Shannon entropy , 2005, Genome Biology.

[24]  D. Covell,et al.  Molecular classification of cancer: unsupervised self-organizing map analysis of gene expression microarray data. , 2003, Molecular cancer therapeutics.

[25]  G. Stephanopoulos,et al.  A compendium of gene expression in normal human tissues. , 2001, Physiological genomics.

[26]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[27]  P. Provero,et al.  Functional Annotation and Identification of Candidate Disease Genes by Computational Analysis of Normal Tissue Gene Expression Data , 2008, PloS one.

[28]  Ilya Shmulevich,et al.  ProbCD: enrichment analysis accounting for categorization uncertainty , 2007, BMC Bioinformatics.

[29]  Sui Huang,et al.  Gene Expression Dynamics Inspector (GEDI): for integrative analysis of expression profiles , 2003, Bioinform..

[30]  Gabriel S. Eichler,et al.  Cell fates as high-dimensional attractor states of a complex gene regulatory network. , 2005, Physical review letters.

[31]  J. Astola,et al.  Systematic bioinformatic analysis of expression levels of 17,330 human genes across 9,783 samples from 175 types of healthy and pathological tissues , 2008, Genome Biology.

[32]  T. Kohonen Self-Organized Formation of Correct Feature Maps , 1982 .

[33]  Kevin Camphausen,et al.  Influence of in vivo growth on human glioma cell line gene expression: convergent profiles under orthotopic conditions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Martin von Bergen,et al.  Expression cartography of human tissues using self organizing maps , 2011 .

[36]  John Quackenbush,et al.  Decomposition of Gene Expression State Space Trajectories , 2009, PLoS Comput. Biol..

[37]  I. Tsigelny,et al.  Analysis of Metagene Portraits Reveals Distinct Transitions During Kidney Organogenesis , 2008, Science Signaling.

[38]  H. Hornshøj,et al.  Microarray Expression Profiles of 20.000 Genes across 23 Healthy Porcine Tissues , 2007, PloS one.

[39]  Search for relevant sets of variables in a high‐dimensional setup keeping the familywise error rate , 2005 .

[40]  Junbai Wang,et al.  Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study , 2002, BMC Bioinformatics.

[41]  Eric E. Schadt,et al.  Moving toward a system genetics view of disease , 2007, Mammalian Genome.

[42]  Olaf Kolditz,et al.  Helmholtz Interdisciplinary Graduate School for Environmental Research (HIGRADE) , 2008 .

[43]  D. Botstein,et al.  A DNA microarray survey of gene expression in normal human tissues , 2005, Genome Biology.

[44]  John Quackenbush Genomics. Microarrays--guilt by association. , 2003, Science.

[45]  Korbinian Strimmer,et al.  A unified approach to false discovery rate estimation , 2008, BMC Bioinformatics.

[46]  Z. Szallasi,et al.  Correction of technical bias in clinical microarray data improves concordance with known biological information , 2008, Genome Biology.

[47]  Renaud Gaujoux,et al.  A flexible R package for nonnegative matrix factorization , 2010, BMC Bioinformatics.

[48]  Bing Zhang,et al.  WebGestalt: an integrated system for exploring gene sets in various biological contexts , 2005, Nucleic Acids Res..

[49]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[50]  Stephan Preibisch,et al.  "Hook"-calibration of GeneChip-microarrays: Theory and algorithm , 2008, Algorithms for Molecular Biology.

[51]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[52]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[53]  Philip M. Kim,et al.  Subsystem identification through dimensionality reduction of large-scale gene expression data. , 2003, Genome research.