An Extensible, Scalable Architecture for Managing Bioinformatics Data and Analyses

Systems biology research demands the availability of tools and technologies that span a comprehensive range of computational capabilities, including data management, transfer, processing, integration, and interpretation. To address these needs, we have created the bioinformatics resource manager (BRM), a scalable, flexible, and easy to use tool for biologists to undertake complex analyses. This paper describes the underlying software architecture of the BRM that integrates multiple commodity platforms to provide a highly extensible and scalable software infrastructure for bioinformatics. The architecture integrates a J2EE 3-tier application with an archival experimental data management system, the GAGGLE framework for desktop tool integration, and the MeDICi integration framework for high-throughput data analysis workflows. This architecture facilitates a systems biology software solution that enables the entire spectrum of scientific activities, from experimental data access to high throughput processing and analysis of data for biologists and experimental scientists.

[1]  Amy K. Schmid,et al.  The Firegoose: two-way integration of diverse data from different bioinformatics web resources with desktop applications , 2007, BMC Bioinformatics.

[2]  João Eduardo Ferreira,et al.  GenFlow: generic flow for integration, management and analysis of molecular biology data , 2004 .

[3]  David J. Reiss,et al.  The Gaggle: An open-source software system for integrating bioinformatics software and data sources , 2006, BMC Bioinformatics.

[4]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[5]  Nikolay A. Kolchanov,et al.  Bioinformatics of Genome Regulation and Structure , 2013, Springer US.

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  Nikola Tolić,et al.  PRISM: A data management system for high‐throughput proteomics , 2006, Proteomics.

[8]  A. Goesmann,et al.  Building a BRIDGE for the integration of heterogeneous data from functional genomics into a platform for systems biology. , 2003, Journal of biotechnology.

[9]  Mudita Singhal,et al.  Enabling high-throughput data management for systems biology: The Bioinformatics Resource Manager , 2007, Bioinform..

[10]  L. F. Perrone,et al.  SBW – A MODULAR FRAMEWORK FOR SYSTEMS BIOLOGY , 2006 .

[11]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[12]  Priyanka Gupta,et al.  BioWarehouse: a bioinformatics database warehouse toolkit , 2006, BMC Bioinformatics.

[13]  Eric W. Deutsch,et al.  SBEAMS-Microarray: database software supporting genomic expression analyses for systems biology , 2006, BMC Bioinformatics.

[14]  Gary D. Bader,et al.  SeqHound: biological sequence and structure database as a platform for bioinformatics research , 2002, BMC Bioinformatics.

[15]  Adam Wynne,et al.  The MeDICi Integration Framework: A Platform for High Performance Data Streaming Applications , 2008, Seventh Working IEEE/IFIP Conference on Software Architecture (WICSA 2008).

[16]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[17]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2006, Nucleic Acids Research.

[18]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[19]  Vasa Curcin,et al.  KDE Bioscience: Platform for bioinformatics analysis workflows , 2005, Journal of Biomedical Informatics.

[20]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[21]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[22]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[23]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[24]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[25]  Gary D. Bader,et al.  BIND-a data specification for storing and describing biomolecular interactions, molecular complexes and pathways , 2000, Bioinform..