Integration of biological sources: current systems and challenges ahead

This paper surveys the area of biological and genomic sources integration, which has recently become a major focus of the data integration research field. The challenges that an integration system for biological sources must face are due to several factors such as the variety and amount of data available, the representational heterogeneity of the data in the different sources, and the autonomy and differing capabilities of the sources. This survey describes the main integration approaches that have been adopted. They include warehouse integration, mediator-based integration, and navigational integration. Then we look at the four major existing integration systems that have been developed for the biological domain: SRS, BioKleisli, TAMBIS, and DiscoveryLink. After analyzing these systems and mentioning a few others, we identify the pros and cons of the current approaches and systems and discuss what an integration system for biologists ought to be.

[1]  Laura M. Haas,et al.  Transforming Heterogeneous Data with Database Middleware: Beyond Integration , 1999, IEEE Data Eng. Bull..

[2]  Ioana Manolescu,et al.  Answering XML Queries on Heterogeneous Data Sources , 2001, VLDB.

[3]  Alon Y. Halevy,et al.  A model for data integration systems of biomedical data applied to online genetic databases , 2001, AMIA.

[4]  Subbarao Kambhampati,et al.  Joint optimization of cost and coverage of query plans in data integration , 2001, CIKM '01.

[5]  Craig A. Knoblock,et al.  The Ariadne Approach to Web-Based Information Integration , 2001, Int. J. Cooperative Inf. Syst..

[6]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[7]  Carole A. Goble,et al.  TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources , 1998, ISMB.

[8]  Bertram Ludäscher,et al.  Knowledge-based integration of neuroscience data sources , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[9]  Bertram Ludäscher,et al.  Model-based mediation with domain maps , 2001, Proceedings 17th International Conference on Data Engineering.

[10]  Yang Liu,et al.  A Decentralized Approach to the Integration of Life Science Web Databases , 2003, Informatica.

[11]  Calton Pu,et al.  Querying multiple bioinformatics information sources: can semantic web research help? , 2002, SGMD.

[12]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[13]  Subbarao Kambhampati,et al.  A frequency-based approach for mining coverage statistics in data integration , 2004, Proceedings. 20th International Conference on Data Engineering.

[14]  Louiqa Raschid,et al.  Optimized seamless integration of biomolecular data , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[15]  FlorescuDaniela,et al.  Database techniques for the World-Wide Web , 1998 .

[16]  Felix Naumann,et al.  Exploring Life Sciences Data Sources , 2003, IIWeb.

[17]  Peter Buneman,et al.  Challenges in Integrating Biological Data Sources , 1995, J. Comput. Biol..

[18]  Andrea Calì,et al.  On the Expressive Power of Data Integration Systems , 2002, ER.

[19]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: 2002 update , 2002, Nucleic Acids Res..

[20]  Walter V. Sujansky,et al.  Heterogeneous Database Integration in Biomedicine , 2001, J. Biomed. Informatics.

[21]  Limsoon Wong,et al.  The Kleisli Approach to Data Transformation and Integration , 2004 .

[22]  Markus Schneider,et al.  Genomics Algebra: A New, Integrating Data Model, Language, and Tool for Processing and Querying Genomic Information , 2003, CIDR.

[23]  Laura M. Haas,et al.  Integrating life sciences data-with a little Garlic , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[24]  Silvana Castano,et al.  Report on the EDBT'02 panel on scientific data integration , 2002, SGMD.

[25]  Todd D. Millstein,et al.  Navigational Plans For Data Integration , 1999, AAAI/IAAI.

[26]  Carole A. Goble,et al.  Information Management for Genome Level Bioinformatics , 2001, VLDB.

[27]  Val Tannen,et al.  K2/Kleisli and GUS: Experiments in integrated access to genomic data sources , 2001, IBM Syst. J..

[28]  Carole A. Goble,et al.  Query processing in the TAMBIS bioinformatics source integration system , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[29]  Limsoon Wong,et al.  A Data Transformation System for Biological Data Sources , 1995, VLDB.

[30]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[31]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: 2003 update , 2003, Nucleic Acids Res..

[32]  Felix Naumann,et al.  Completeness of Information Sources , 2000 .