Technologies for Integrating Biological Data

The process of building a new database relevant to some field of study in biomedicine involves transforming, integrating and cleansing multiple data sources, as well as adding new material and annotations. This paper reviews some of the requirements of a general solution to this data integration problem. Several representative technologies and approaches to data integration in biomedicine are surveyed. Then some interesting features that separate the more general data integration technologies from the more specialised ones are highlighted.

[1]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[2]  David Schach,et al.  XML Query Language (XQL) , 1998, QL.

[3]  P G Baker,et al.  Recent developments in biological sequence databases. , 1998, Current opinion in biotechnology.

[4]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[5]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[6]  Ahmad Ashari,et al.  Storing And Querying XML Data Using RDBMS , 2004, iiWAS.

[7]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[8]  Carole A. Goble,et al.  Transparent access to multiple bioinformatics information sources , 2001, IBM Syst. J..

[9]  S Walsh,et al.  ACEDB: a database for genome information. , 1998, Methods of biochemical analysis.

[10]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[11]  Peter Buneman,et al.  Challenges in Integrating Biological Data Sources , 1995, J. Comput. Biol..

[12]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[13]  Juergen Sellentin,et al.  Data-intensive intra- & internet applications — experiences using java and corba in the world wide web , 1999 .

[14]  Emmanuel Barillot,et al.  XML, bioinformatics and data integration , 2001, Bioinform..

[15]  Dan Suciu,et al.  Efficient evaluation of XML middle-ware queries , 2001, SIGMOD '01.

[16]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[17]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[18]  Limsoon Wong,et al.  Kleisli, a functional query system , 2000, J. Funct. Program..

[19]  Table of Contents , 2020, Pediatric Neurology.

[20]  Bernhard Mitschang,et al.  Data-intensive intra- and Internet applications-experiences using Java and CORBA in the World Wide Web , 1998, Proceedings 14th International Conference on Data Engineering.

[21]  Steven J. DeRose,et al.  XQuery: A unified syntax for linking and querying general XML documents , 1998, QL.

[22]  Limsoon Wong,et al.  Kleisli: its exchange format, supporting tools, and an application in protein interaction extraction , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[23]  S. Brenner Errors in genome annotation. , 1999, Trends in genetics : TIG.

[24]  Vladimir Brusic,et al.  Data Warehousing in Molecular Biology , 2000, Briefings Bioinform..

[25]  G. Schuler,et al.  Entrez: molecular biology database and retrieval system. , 1996, Methods in enzymology.

[26]  I-Min A Chen,et al.  An Overview of the Object-Protocol Model (OPM) and OPM Data Management Tools , 1995, Inf. Syst..

[27]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[28]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[29]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[30]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[31]  E. F. Codd,et al.  A Relational Model for Large Shared Data Banks , 1970 .