*omeSOM: a software for clustering and visualization of transcriptional and metabolite data mined from interspecific crosses of crop plants

BackgroundModern biology uses experimental systems that involve the exploration of phenotypic variation as a result of the recombination of several genomes. Such systems are useful to investigate the functional evolution of metabolic networks. One such approach is the analysis of transcript and metabolite profiles. These kinds of studies generate a large amount of data, which require dedicated computational tools for their analysis.ResultsThis paper presents a novel software named *omeSOM (transcript/metabol-ome Self Organizing Map) that implements a neural model for biological data clustering and visualization. It allows the discovery of relationships between changes in transcripts and metabolites of crop plants harboring introgressed exotic alleles and furthermore, its use can be extended to other type of omics data. The software is focused on the easy identification of groups including different molecular entities, independently of the number of clusters formed. The *omeSOM software provides easy-to-visualize interfaces for the identification of coordinated variations in the co-expressed genes and co-accumulated metabolites. Additionally, this information is linked to the most widely used gene annotation and metabolic pathway databases.Conclusions*omeSOM is a software designed to give support to the data mining task of metabolic and transcriptional datasets derived from different databases. It provides a user-friendly interface and offers several visualization features, easy to understand by non-expert users. Therefore, *omeSOM provides support for data mining tasks and it is applicable to basic research as well as applied breeding programs. The software and a sample dataset are available free of charge at http://sourcesinc.sourceforge.net/omesom/.

[1]  Atul J. Butte,et al.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks , 2005, BMC Bioinformatics.

[2]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[3]  Staffan Persson,et al.  Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. , 2009, Plant, cell & environment.

[4]  Peter Meinicke,et al.  MarVis: a tool for clustering and visualization of metabolic biomarkers , 2009, BMC Bioinformatics.

[5]  Yi Pan,et al.  Computational Intelligence in Bioinformatics , 2007 .

[6]  Lesley Jones,et al.  Microarray Gene Expression Data Analysis: A Beginners Guide , 2004, Human Genetics.

[7]  Donald K. Wedding,et al.  Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[8]  Georgina Stegmayer,et al.  Neural network model for integration and visualization of introgressed genome and metabolite data , 2009, 2009 International Joint Conference on Neural Networks.

[9]  Nature Genetics , 1991, Nature.

[10]  M. Hirai,et al.  Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Alfred Ultsch,et al.  Data Mining and Knowledge Discovery with Emergent Self-Organizing Feature Maps for Multivariate Time Series , 1999 .

[13]  Falk Schreiber,et al.  VANTED: A system for advanced data analysis and visualization in the context of biological networks , 2006, BMC Bioinformatics.

[14]  Zhikang Li,et al.  Genome-wide Introgression Lines and their Use in Genetic and Molecular Dissection of Complex Phenotypes in Rice (Oryza sativa L.) , 2005, Plant Molecular Biology.

[15]  D. Zamir,et al.  An introgression line population of Lycopersicon pennellii in the cultivated tomato enables the identification and fine mapping of yield-associated QTL. , 1995, Genetics.

[16]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[17]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[18]  BMC Bioinformatics , 2005 .

[19]  A. Fernie,et al.  Gas chromatography mass spectrometry–based metabolite profiling in plants , 2006, Nature Protocols.

[20]  M. Zanor,et al.  Integrated Analysis of Metabolite and Transcript Levels Reveals the Metabolic Shifts That Underlie Tomato Fruit Development and Highlight Regulatory Aspects of Metabolic Network Behavior1[W] , 2006, Plant Physiology.

[21]  美弦 矢野,et al.  <ファクトデータベース・フリーウェア特集号> 一括学習型自己組織化マップ(BL-SOM)を利用したメタボロームおよびトランスクリプトームデータの統合解析 , 2006 .

[22]  Timothy M. D. Ebbels,et al.  Correlation Network Analysis reveals a sequential reorganization of metabolic and transcriptional states during germination and gene-metabolite relationships in developing seedlings of Arabidopsis , 2010, BMC Systems Biology.

[23]  Kazuki Saito,et al.  Integrated Data Mining of Transcriptome and Metabolome Based on BL-SOM , 2006 .

[24]  Je-Gun Joung,et al.  Plant MetGenMAP: An Integrative Analysis System for Plant Systems Biology1[W][OA] , 2009, Plant Physiology.

[25]  Daniel Eriksson,et al.  Data integration in plant biology: the O2PLS method for combined modeling of transcript and metabolite data. , 2007, The Plant journal : for cell and molecular biology.

[26]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[27]  John Quackenbush,et al.  Microarray gene expression data analysis - a beginner's guide , 2003 .

[28]  Kazuki Saito,et al.  Potential of metabolomics as a functional genomics tool. , 2004, Trends in plant science.

[29]  Loren H. Rieseberg,et al.  lntrogression and Its Consequences in Plants , 1993 .

[30]  B. Neel,et al.  Genetic and cellular mechanisms of oncogenesis , 2007 .

[31]  Aaron M. Newman,et al.  AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number , 2010, BMC Bioinformatics.

[32]  H. Mollenkopf,et al.  Gene Expression Profiles of Chlamydophila pneumoniae during the Developmental Cycle and Iron Depletion–Mediated Persistence , 2007, PLoS pathogens.

[33]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[34]  Yuehui Chen,et al.  Computational Intelligence in Bioinformatics , 2008, Computational Intelligence in Bioinformatics.

[35]  M. Daly,et al.  Guilt by association , 2000, Nature Genetics.

[36]  Steven J. Barrett Intelligent Bioinformatics: The Application of Artificial Intelligence Techniques to Bioinformatics Problems , 2006, Genetic Programming and Evolvable Machines.

[37]  Junbai Wang,et al.  Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study , 2002, BMC Bioinformatics.

[38]  Jingyuan Fu,et al.  System-wide molecular evidence for phenotypic buffering in Arabidopsis , 2009, Nature Genetics.

[39]  Z. Lippman,et al.  An integrated view of quantitative trait variation using tomato interspecific introgression lines. , 2007, Current opinion in genetics & development.

[40]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[41]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.