An Evaluation of Multiple Approaches for Federating Biological Data

Between technological breakthroughs and new computational approaches, the quantity of biological data is increasing explosively. As of 2007, there were 1078 biological databases. Providing biologists with central and uniform access to all types of data stored in biological databases is becoming critical. To minimize disruption of current operations, maintain local autonomy and handle heterogeneities, federated databases and Web services have been proposed as viable solutions. This paper explores these issues and reports on our experience with testing multiple approaches for biological database integration. It discusses the trade-offs among performance, support for heterogeneity, robustness and scalability. A significant result of our study is that the most flexible approach, Web Services, performs very competitively.

[1]  Stefano Spaccapietra,et al.  Issues and approaches of database integration , 1998, CACM.

[2]  Peter D. Karp,et al.  A Strategy for Database Interoperation , 1995, J. Comput. Biol..

[3]  Bertram Ludäscher,et al.  A cell-centered database for electron tomographic data. , 2002, Journal of structural biology.

[4]  Calton Pu,et al.  Guest Editors' Introduction to the Special Issue on Heterogeneous Databases , 1990, ACM Computing Surveys.

[5]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information , 2021, Nucleic Acids Res..

[6]  Alan J. Robinson,et al.  XEMBL: distributing EMBL data in XML format , 2002, Bioinform..

[7]  Wilhelm Hasselbring,et al.  Information system integration , 2000, CACM.

[8]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[9]  Rolf Apweiler,et al.  The EBI SRS server-new features , 2002, Bioinform..

[10]  Michael Y. Galperin The Molecular Biology Database Collection: 2008 update , 2007, Nucleic Acids Res..

[11]  Amit P. Sheth,et al.  Semantic Interoperability of Web Services - Challenges and Experiences , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[12]  James A. Larson,et al.  A Theory of Attribute Equivalence in Databases with Application to Schema Integration , 1989, IEEE Trans. Software Eng..

[13]  Stephan Philippi Light-weight integration of molecular biological databases , 2004, Bioinform..

[14]  Robert Steele,et al.  Evaluating SOAP for High Performance Business Applications: Real-Time Trading Systems , 2003, WWW.

[15]  Amit P. Sheth,et al.  Adding Semantics to Web Services Standards , 2003, ICWS.

[16]  Nick Roussopoulos,et al.  Interoperability of multiple autonomous databases , 1990, CSUR.

[17]  B. Miskie,et al.  Comparative Mouse Genomics Centers Consortium (CMGCC): Mouse Models to Improve Understanding of the Biological Significance of Human Polymorphisms , 2005, Environmental Health Perspectives.

[18]  Carole A. Goble,et al.  A classification of tasks in bioinformatics , 2001, Bioinform..

[19]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[20]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[21]  Eileen Kraemer,et al.  A Comparison of Federated Databases with Web Services for the Integration of Bioinformatics Data , 2007, BIOCOMP.

[22]  Alexander S. Szalay,et al.  SkyQuery: A Web Service Approach to Federate Databases , 2003, CIDR.

[23]  Laura M. Haas,et al.  Data integration through database federation , 2002, IBM Syst. J..

[24]  Peter M. D. Gray,et al.  Architecture of a mediator for a bioinformatics database federation , 2002, IEEE Transactions on Information Technology in Biomedicine.

[25]  Mark D. Wilkinson,et al.  BioMOBY: An Open Source Biological Web Services Proposal , 2002, Briefings Bioinform..

[26]  Val Tannen,et al.  K2/Kleisli and GUS: Experiments in integrated access to genomic data sources , 2001, IBM Syst. J..

[27]  Cathy H. Wu,et al.  iProClass: an integrated database of protein family, function and structure information , 2003, Nucleic Acids Res..

[28]  I-Min A Chen,et al.  An Overview of the Object-Protocol Model (OPM) and OPM Data Management Tools , 1995, Inf. Syst..

[29]  Thure Etzold,et al.  SRS - an indexing and retrieval tool for flat file data libraries , 1993, Comput. Appl. Biosci..

[30]  Hideaki Sugawara,et al.  DNA Data Bank of Japan (DDBJ) in XML , 2003, Nucleic Acids Res..

[31]  Carole A. Goble,et al.  myGrid: personalised bioinformatics on the information grid , 2003, ISMB.

[32]  Bruce J. Neubauer Web Services and Service-Oriented Architectures , 2008 .

[33]  Limsoon Wong,et al.  BioKleisli: a digital library for biomedical researchers , 1997, International Journal on Digital Libraries.

[34]  O Ritter,et al.  Prototype implementation of the integrated genomic database. , 1994, Computers and biomedical research, an international journal.

[35]  Andrew Hayes,et al.  GIMS: an integrated data storage and analysis environment for genomic and functional data , 2003, Yeast.

[36]  Vladimir Brusic,et al.  Data Warehousing in Molecular Biology , 2000, Briefings Bioinform..

[37]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[38]  L. Stein Integrating biological databases , 2003, Nature Reviews Genetics.

[39]  Victor Markowitz,et al.  OPM: Object-Protocol Model Data Management Tools ’97 , 2002 .