Adaptable data management for systems biology investigations

BackgroundWithin research each experiment is different, the focus changes and the data is generated from a continually evolving barrage of technologies. There is a continual introduction of new techniques whose usage ranges from in-house protocols through to high-throughput instrumentation. To support these requirements data management systems are needed that can be rapidly built and readily adapted for new usage.ResultsThe adaptable data management system discussed is designed to support the seamless mining and analysis of biological experiment data that is commonly used in systems biology (e.g. ChIP-chip, gene expression, proteomics, imaging, flow cytometry). We use different content graphs to represent different views upon the data. These views are designed for different roles: equipment specific views are used to gather instrumentation information; data processing oriented views are provided to enable the rapid development of analysis applications; and research project specific views are used to organize information for individual research experiments. This management system allows for both the rapid introduction of new types of information and the evolution of the knowledge it represents.ConclusionData management is an important aspect of any research enterprise. It is the foundation on which most applications are built, and must be easily extended to serve new functionality for new scientific areas. We have found that adopting a three-tier architecture for data management, built around distributed standardized content repositories, allows us to rapidly develop new applications to support a diverse user community.

[1]  L. Hood,et al.  The digital code of DNA , 2003, Nature.

[2]  Joel H. Saltz,et al.  caGrid: design and implementation of the core architecture of the cancer biomedical informatics grid , 2006, Bioinform..

[3]  Cristina V. Lopes,et al.  Aspect-oriented programming , 1999, ECOOP Workshops.

[4]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[5]  Charles W. Bachman Summary of current work ANSI/X3/SPARC/study group: database systems , 1974, SGMD.

[6]  Douglas A. Creager,et al.  The Open Microscopy Environment (OME) Data Model and XML file: open tools for informatics and quantitative analysis in biological imaging , 2005, Genome Biology.

[7]  Ilya Shmulevich,et al.  Systems biology driven software design for the research enterprise , 2007, BMC Bioinformatics.

[8]  Catherine Plaisant,et al.  SpaceTree: supporting exploration in large node link tree, design evolution and empirical evaluation , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[9]  Bruz Marzolf,et al.  SLIMarray: Lightweight software for microarray facility management , 2006, Source Code for Biology and Medicine.

[10]  Diane Kelly A Software Chasm: Software Engineering and Scientific Computing , 2007, IEEE Software.

[11]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[12]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[13]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[14]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[15]  Gregor Kiczales,et al.  Aspect-oriented programming , 2001, ESEC/FSE-9.

[16]  Gerhard Fischer,et al.  DOMAIN-ORIENTED DESIGN ENVIRONMENTS: SUPPORTING INDIVIDUAL AND SOCIAL CREATIVITY , 1998 .