Challenges in Integrating Biological Data Sources

Scientific data of importance to biologists reside in a number of different data sources, such as GenBank, GSDB, SWISS-PROT, EMBL, and OMIM, among many others. Some of these data sources are conventional databases implemented using database management systems (DBMSs) and others are structured files maintained in a number of different formats (e.g., ASN.1 and ACE). In addition, software packages such as sequence analysis packages (e.g., BLAST and FASTA) produce data and can therefore be viewed as data sources. To counter the increasing dispersion and heterogeneity of data, different approaches to integrating these data sources are appearing throughout the bioinformatics community. This paper surveys the technical challenges to integration, classifies the approaches, and critiques the available tools and methodologies.

[1]  Gio Wiederhold,et al.  Modeling asynchrony in distributed databases , 1987, 1987 IEEE Third International Conference on Data Engineering.

[2]  Thure Etzold,et al.  SRS - an indexing and retrieval tool for flat file data libraries , 1993, Comput. Appl. Biosci..

[3]  J. MillerR.,et al.  Schema equivalence in heterogeneous systems , 1994 .

[4]  Serge Abiteboul,et al.  Restructuring Hierarchical Database Objects , 1988, Theor. Comput. Sci..

[5]  Anthony Kosky,et al.  MORPHING SPARSELY POPULATED DATA , .

[6]  Masaya Nakayama,et al.  Hash-Partitioned Join Method Using Dynamic Destaging Strategy , 1988, VLDB.

[7]  Amit P. Sheth,et al.  Specifying interdatabase dependencies in a multidatabase environment , 1991, Computer.

[8]  Jennifer Widom,et al.  View maintenance in a warehousing environment , 1995, SIGMOD '95.

[9]  Anthony Kosky,et al.  Semantics of Database Transformations , 1995, Semantics in Databases.

[10]  Stefano Ceri,et al.  Distributed Databases: Principles and Systems , 1984 .

[11]  Umeshwar Dayal,et al.  View Definition and Generalization for Database Integration in Multibase: A System for Heterogeneous Distributed Databases , 1982, Berkeley Workshop.

[12]  P Holtkamp [A new way]. , 1970, Dental Dienst; Fachzeitschrift fur den Dental-Markt; technisches Fachblatt fur Prothetik.

[13]  O Ritter,et al.  Prototype implementation of the integrated genomic database. , 1994, Computers and biomedical research, an international journal.

[14]  Anthony Kosky,et al.  Theoretical Aspects of Schema Merging , 1992, EDBT.

[15]  Fèlix Saltor,et al.  Suitability of datamodels as canonical models for federated databases , 1991, SGMD.

[16]  Serge Abiteboul,et al.  IFO: a formal semantic database model , 1987, TODS.

[17]  Susumu Goto,et al.  LinkDB: A Database of Cross Links between Molecular Biology Databases , 1997 .

[18]  Dennis McLeod,et al.  Database description with SDM: a semantic database model , 1981, TODS.

[19]  Maurizio Lenzerini,et al.  A Methodology for Data Schema Integration in the Entity Relationship Model , 1984, IEEE Transactions on Software Engineering.

[20]  Limsoon Wong,et al.  A Data Transformation System for Biological Data Sources , 1995, VLDB.

[21]  Umeshwar Dayal,et al.  View Definition and Generalization for Database Integration in a Multidatabase System , 1984, IEEE Transactions on Software Engineering.

[22]  Peretz Shoval,et al.  Binary-Relationship Integration Methodology , 1991, Data Knowl. Eng..

[23]  Renée J. Miller,et al.  Schema equivalence in heterogeneous systems: bridging theory and practice , 1994, Inf. Syst..

[24]  Richard Hull,et al.  Relative information capacity of simple relational database schemata , 1984, SIAM J. Comput..

[25]  Otto Ritter,et al.  Characterizing Heterogeneous Molecular Biology Database Systems , 1995, J. Comput. Biol..

[26]  Dennis McLeod,et al.  A federated architecture for information management , 1985, TOIS.

[27]  Arie Shoshani,et al.  Representing extended entity-relationship structures in relational databases: a modular approach , 1992, TODS.

[28]  James A. Larson,et al.  Integrating User Views in Database Design , 1986, Computer.

[29]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[30]  Nick Roussopoulos,et al.  Interoperability of multiple autonomous databases , 1990, CSUR.

[31]  Won Kim,et al.  A new way to compute the product and join of relations , 1980, SIGMOD '80.

[32]  Jay Banerjee,et al.  Semantics and implementation of schema evolution in object-oriented databases , 1987, SIGMOD '87.