The BioMediator System as a Tool for Integrating Biologic Databases on the Web

BioMediator is a data integration system tailored to the domain of molecular biology. Based on our collaborations with biologic researchers, we have identified several challenges in building a data integration system that addresses their needs. BioMediator provides a common interface to several Web-accessible data sources using a novel source knowledge base to organize metadata about the sources. This approach allows BioMediator to answer poorly specified queries, to re-use wrappers, and to support multiple mediated schemas, which can easily be modified. We describe the system architecture, focusing on query processing and data access and conclude by comparing our approach to the more classic federated approach.

[1]  Felix Naumann,et al.  Links and Paths through Life Sciences Data Sources , 2004, DILS.

[2]  I-Min A Chen,et al.  An Overview of the Object-Protocol Model (OPM) and OPM Data Management Tools , 1995, Inf. Syst..

[3]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[4]  Peter Mork,et al.  Expression Array Annotation Using the BioMediator Biological Data Integration System and the BioConductor Analytic Platform , 2003, AMIA.

[5]  Alon Y. Halevy,et al.  A model for data integration systems of biomedical data applied to online genetic databases , 2001, AMIA.

[6]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[7]  L D Stein,et al.  Scriptable access to the Caenorhabditis elegans genome sequence and other ACEDB databases. , 1998, Genome research.

[8]  Peter M. D. Gray,et al.  Architecture of a mediator for a bioinformatics database federation , 2002, IEEE Transactions on Information Technology in Biomedicine.

[9]  Judith A. Blake,et al.  MGD: the Mouse Genome Database , 2003, Nucleic Acids Res..

[10]  Alon Y. Halevy,et al.  An adaptive query execution system for data integration , 1999, SIGMOD '99.

[11]  William Tapper,et al.  Positional cloning by linkage disequilibrium. , 2004, American journal of human genetics.

[12]  Alon Y. Halevy,et al.  PQL: a declarative query language over dynamic biological schemata , 2002, AMIA.

[13]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[14]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[15]  Russ B. Altman,et al.  Automating Data Acquisition into Ontologies from Pharmacogenetics Relational Data Sources Using Declarative Object Definitions and XML , 2002, Pacific Symposium on Biocomputing.

[16]  Gio Wiederhold,et al.  Intelligent integration of information , 1993, Springer US.

[17]  S. Gygi,et al.  Quantitative analysis of complex protein mixtures using isotope-coded affinity tags , 1999, Nature Biotechnology.

[18]  Maria-Esther Vidal,et al.  Efficient Techniques to Explore and Rank Paths in Life Science Data Sources , 2004, DILS.

[19]  Peter Mork,et al.  A rule driven bi-directional translation system for remapping queries and result sets between a mediated schema and heterogeneous data sources , 2002, AMIA.

[20]  Carole A. Goble,et al.  TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources , 1998, ISMB.

[21]  J L Sussman,et al.  Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. , 1998, Acta crystallographica. Section D, Biological crystallography.

[22]  S. Chung,et al.  Kleisli: a new tool for data integration in biology. , 1999, Trends in biotechnology.