MeMo: a hybrid SQL/XML approach to metabolomic data management for functional genomics

BackgroundThe genome sequencing projects have shown our limited knowledge regarding gene function, e.g. S. cerevisiae has 5–6,000 genes of which nearly 1,000 have an uncertain function. Their gross influence on the behaviour of the cell can be observed using large-scale metabolomic studies. The metabolomic data produced need to be structured and annotated in a machine-usable form to facilitate the exploration of the hidden links between the genes and their functions.DescriptionMeMo is a formal model for representing metabolomic data and the associated metadata. Two predominant platforms (SQL and XML) are used to encode the model. MeMo has been implemented as a relational database using a hybrid approach combining the advantages of the two technologies. It represents a practical solution for handling the sheer volume and complexity of the metabolomic data effectively and efficiently. The MeMo model and the associated software are available at http://dbkgroup.org/memo/.ConclusionThe maturity of relational database technology is used to support efficient data processing. The scalability and self-descriptiveness of XML are used to simplify the relational schema and facilitate the extensibility of the model necessitated by the creation of new experimental techniques. Special consideration is given to data integration issues as part of the systems biology agenda. MeMo has been physically integrated and cross-linked to related metabolomic and genomic databases. Semantic integration with other relevant databases has been supported through ontological annotation. Compatibility with other data formats is supported by automatic conversion.

[1]  John R. Yates,et al.  CEBS object model for systems biology data, SysBio-OM , 2004, Bioinform..

[2]  Michael Heylin,et al.  20 02 STARTING SALARY SURVEY: Salaries and employment for 2001-02 chemistry graduates show less slippage than the job market in general , 2003 .

[3]  L. Hood Systems biology: integrating technology, biology, and computation , 2003, Mechanisms of Ageing and Development.

[4]  Edda Klipp,et al.  Systems Biology , 1994 .

[5]  Matthias Lange,et al.  SEMEDA: ontology based semantic integration of biological databases , 2003, Bioinform..

[6]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[7]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[8]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[9]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[10]  Douglas B. Kell,et al.  A metabolome pipeline: from concept to data to knowledge , 2005, Metabolomics.

[11]  D. Kell Metabolomics and systems biology: making sense of the soup. , 2004, Current opinion in microbiology.

[12]  L. Stein Creating a bioinformatics nation , 2002, Nature.

[13]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[14]  Alfonso Valencia,et al.  XML Databases, are Ready for Bioinformatics? , 2004, Spanish Bioinformatics Conference.

[15]  Jocelyn Kaiser,et al.  Proteomics. Public-private group maps out initiatives. , 2002, Science.

[16]  Alfonso Valencia,et al.  YAdumper: extracting and translating large information volumes from relational databases to structured flat files , 2004, Bioinform..

[17]  Chris F. Taylor,et al.  Pedro: a configurable data entry tool for XML , 2004, Bioinform..

[18]  Nigel W. Hardy,et al.  Summary recommendations for standardization and reporting of metabolic analyses , 2005, Nature Biotechnology.

[19]  Chris F. Taylor,et al.  A systematic approach to modeling, capturing, and disseminating proteomics experimental data , 2003, Nature Biotechnology.

[20]  Rolf Apweiler,et al.  Common interchange standards for proteomics data: Public availability of tools and schema. Report on the Proteomic Standards Initiative Workshop, 2nd Annual HUPO Congress, Montreal, Canada, 8–11th October 2003 , 2004, Proteomics.

[21]  D. Kell,et al.  High-throughput classification of yeast mutants for functional genomics using metabolic footprinting , 2003, Nature Biotechnology.

[22]  Laurian M. Chirica,et al.  The entity-relationship model: toward a unified view of data , 1975, SIGF.

[23]  Carole A. Goble,et al.  Conceptual modelling of genomic information , 2000, Bioinform..

[24]  Andrew Hayes,et al.  GIMS: an integrated data storage and analysis environment for genomic and functional data , 2003, Yeast.

[25]  Douglas B. Kell,et al.  maxdLoad2 and maxdBrowse: standards-compliant tools for microarray experimental annotation, data management and dissemination , 2005, BMC Bioinformatics.

[26]  Jocelyn Kaiser,et al.  Public-Private Group Maps Out Initiatives , 2002, Science.

[27]  Michael Y. Galperin The Molecular Biology Database Collection: 2005 update , 2004, Nucleic Acids Res..

[28]  R. King,et al.  On the optimization of classes for the assignment of unidentified reading frames in functional genomics programmes: the need for machine learning. , 2000, Trends in biotechnology.

[29]  Nigel W. Hardy,et al.  A proposed framework for the description of plant metabolomics experiments and their results , 2004, Nature Biotechnology.

[30]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[31]  Kazuki Saito,et al.  Potential of metabolomics as a functional genomics tool. , 2004, Trends in plant science.

[32]  Camille Rosenthal-Sabroux,et al.  Using the Unified Modelling Language (UML) to guide the systemic description of biological processes and systems. , 2004, Bio Systems.

[33]  Joshua D. Knowles,et al.  Closed-loop, multiobjective optimization of analytical instrumentation: gas chromatography/time-of-flight mass spectrometry of the metabolomes of human serum and of yeast fermentations. , 2005, Analytical chemistry.

[34]  Stephanie J. Reisinger,et al.  Designing databases to store biological information , 2003 .

[35]  Yves Gibon,et al.  GMD@CSB.DB: the Golm Metabolome Database , 2005, Bioinform..

[36]  Mark Needleman,et al.  XML Schema Language , 2001 .

[37]  O. Fiehn,et al.  Metabolite profiling for plant functional genomics , 2000, Nature Biotechnology.

[38]  Ela Hunt,et al.  An object model and database for functional genomics , 2004, Bioinform..

[39]  Jason E. Stewart,et al.  Design and implementation of microarray gene expression markup language (MAGE-ML) , 2002, Genome Biology.

[40]  Igor Goryanin,et al.  EchoBASE: an integrated post-genomic database for Escherichia coli , 2004, Nucleic Acids Res..

[41]  Norman W. Paton,et al.  An analysis of extensible modelling for functional genomics data , 2005, BMC Bioinformatics.

[42]  Ivar Jacobson,et al.  The Unified Modeling Language User Guide , 1998, J. Database Manag..

[43]  Akhilesh Pandey,et al.  From biological databases to platforms for biomedical discovery. , 2003, Trends in biotechnology.

[44]  Wayne Boucher,et al.  The CCPN data model for NMR spectroscopy: Development of a software pipeline , 2005, Proteins.

[45]  Emmanuel Barillot,et al.  XML, bioinformatics and data integration , 2001, Bioinform..