Integrating Biological Data Sources and Data Analysis Tools through Mediators (available online only)

We present an architecture for XML-based mediators systems and a framework for helping systems developers in the construction of mediator-services to integrate heterogeneous data sources. A unique feature of our architecture is its capability to manage user’s software tools and algorithms, modelled as Extended Value Added Services (EVASs), an integrated in the data flow. The mediator offers a view of the system as a single data source where EVASs are readily available for enhancing query processing. A web-based graphic interface has been developed to allow dynamic and flexible EVASs inter-connection, thus creating complex distributed bioinformatics machines. The feasibility and usefulness of our ideas have been validated by the development of a mediator system (BioBroker) and also by a diverse set of applications aimed at combining gene expression data with genomic, sequence-based and structural information, so as to provide a general, transparent and powerful solution, that integrating data analysis tools and algorithms goes beyond traditional gene expression data clustering mediators developed so far.

[1]  François Rechenmann,et al.  From data to knowledge , 2000, Bioinform..

[2]  Alan J. Robinson,et al.  XEMBL: distributing EMBL data in XML format , 2002, Bioinform..

[3]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[4]  Philippe Bessières,et al.  Micado - a network-oriented database for microbial genomes , 1997, Comput. Appl. Biosci..

[5]  Carole A. Goble,et al.  TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources , 1998, ISMB.

[6]  Bertram Ludäscher,et al.  Knowledge-based integration of neuroscience data sources , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[7]  Ian M. Donaldson,et al.  BIND: the Biomolecular Interaction Network Database , 2001, Nucleic Acids Res..

[8]  Limsoon Wong,et al.  BioKleisli: a digital library for biomedical researchers , 1997, International Journal on Digital Libraries.

[9]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[10]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[11]  O. Trelles,et al.  A Computational Strategy for Protein Function Assignment Which Addresses the Multidomain Problem , 2002, Comparative and functional genomics.

[12]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[13]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[14]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[15]  Uwe Scholz,et al.  A Computational Support for the Access to Integrated Molecular Biology Data , 2001, German Conference on Bioinformatics.

[16]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[17]  Mary Roth,et al.  Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources , 1997, VLDB.

[18]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[19]  Shahrokh Saeednia,et al.  How to maintain both privacy and authentication in digital libraries , 2000 .

[20]  Calton Pu,et al.  Querying multiple bioinformatics information sources: can semantic web research help? , 2002, SGMD.

[21]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..