Databases for systems biology

ABSTRACT The ultimate goal of researchers in the interdisciplinary field of systems biology is to solve biological problems at the level of an entire system. Achieving this goal requires supporting the efforts of experimental biologists and computational modelers. Optimally, the phases of planning, actual experimentation, and data analysis (as well as model development, testing, and validation) would all be supported by one database solution. There is currently no integrative source for all information required in a computer-generated model of a biological system, and no system capable of providing support for all three phases of a systems biology endeavor. We present the concept of an integrative database for systems biology that functions as a data warehouse system and supports all three phases of a systems biology project. This database system consists of three modules with different data models supporting the particular requirements of utilizing the three general types of data required: experimental data, components, and reactions of biological systems and mathematical models. The model and experiment modules are linked through the component/reaction module, eliminating the need to store complete information about any one entity more than once in the database. Complete functional models and simulations of particular interest are stored as SBML (Systems Biology Markup Language) files and linked to all necessary information within the database. This combination of modules tailored for dealing with the different data types and the interaction of these modules via links will meet the needs of researchers in the area of systems biology.

[1]  Erik Brauner,et al.  Informatics and Quantitative Analysis in Biological Imaging , 2003, Science.

[2]  John R. Yates,et al.  CEBS object model for systems biology data, SysBio-OM , 2004, Bioinform..

[3]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[4]  M. Kanehisa A database for post-genome analysis. , 1997, Trends in genetics : TIG.

[5]  Catherine M Lloyd,et al.  CellML: its future, present and past. , 2004, Progress in biophysics and molecular biology.

[6]  Andreas Kremling,et al.  Modular Modeling of Cellular Systems with ProMoT/Diva , 2003, Bioinform..

[7]  S. Gruvberger,et al.  BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data , 2002, Genome Biology.

[8]  E. Birney,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Research.

[9]  Aubrey E. Hill,et al.  The UAB Proteomics Database , 2003, Bioinform..

[10]  David Botstein,et al.  SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data , 2003, Nucleic Acids Res..

[11]  Simon C. Potter,et al.  An overview of Ensembl. , 2004, Genome research.

[12]  Gavin Sherlock,et al.  The Longhorn Array Database (LAD): An Open-Source, MIAME compliant implementation of the Stanford Microarray Database (SMD) , 2003, BMC Bioinformatics.

[13]  Rolf Apweiler,et al.  Further steps towards data standardisation: The Proteomic Standards Initiative HUPO 3rd annual congress, Beijing 25–27th October, 2004 , 2005, Proteomics.

[14]  Duccio Cavalieri,et al.  Standards for Microarray Data , 2002, Science.

[15]  Jason E. Stewart,et al.  Design and implementation of microarray gene expression markup language (MAGE-ML) , 2002, Genome Biology.

[16]  T. Bray,et al.  XML and the Second-Generation WEB , 1999 .

[17]  Craig Larman,et al.  Applying UML and patterns , 1997 .

[18]  Alberto Riva,et al.  The MAPPER database: a multi-genome catalog of putative transcription factor binding sites , 2004, Nucleic Acids Res..

[19]  Xin Chen,et al.  The TRANSFAC system on gene expression regulation , 2001, Nucleic Acids Res..

[20]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[21]  P Mendes,et al.  Biochemistry by numbers: simulation of biochemical pathways with Gepasi 3. , 1997, Trends in biochemical sciences.

[22]  S. Wodak,et al.  Representing and Analysing Molecular and Cellular Function Using the Computer , 2000, Biological chemistry.

[23]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[24]  Masaru Tomita,et al.  E-Cell 2: Multi-platform E-Cell simulation system , 2003, Bioinform..

[25]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[26]  Prakash M. Nadkarni,et al.  Data Extraction and Ad Hoc Query of an Entity– Attribute–Value Database , 2000 .

[27]  David Botstein,et al.  The Stanford Microarray Database: data access and quality assessment tools , 2003, Nucleic Acids Res..

[28]  Alistair J. P. Brown,et al.  PEDRo: A database for storing, searching and disseminating experimental proteomics data , 2004, BMC Genomics.

[29]  Rolf Apweiler,et al.  The Proteomics Standards Initiative , 2003, Proteomics.

[30]  Suzanna E Lewis,et al.  Gene Ontology: looking backwards and forwards , 2004, Genome Biology.

[31]  Emmanuel Barillot,et al.  DBcat: a catalog of 500 biological databases , 2000, Nucleic Acids Res..

[32]  C. Hoogland,et al.  SWISS‐2DPAGE, ten years later , 2004, Proteomics.

[33]  Ron D. Appel,et al.  ExPASy: the proteomics server for in-depth protein knowledge and analysis , 2003, Nucleic Acids Res..

[34]  Gavin Sherlock,et al.  The Stanford Microarray Database accommodates additional microarray platforms and data formats , 2004, Nucleic Acids Res..

[35]  Jacky L. Snoep,et al.  Web-based kinetic modelling using JWS Online , 2004, Bioinform..

[36]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[37]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[38]  Antje Chang,et al.  New Developments , 2003 .

[39]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[40]  L. Loew,et al.  Quantitative cell biology with the Virtual Cell. , 2003, Trends in cell biology.

[41]  Esther Ratsch,et al.  A database system for the analysis of biochemical pathways , 2002, Silico Biol..