SEMEDA: ontology based semantic integration of biological databases

MOTIVATION Many molecular biological databases are implemented on relational Database Management Systems, which provide standard interfaces like JDBC and ODBC for data and metadata exchange. By using these interfaces, many technical problems of database integration vanish and issues related to semantics remain, e.g. the use of different terms for the same things, different names for equivalent database attributes and missing links between relevant entries in different databases. RESULTS In this publication, principles and methods that were used to implement SEMEDA (Semantic Meta Database) are described. Database owners can use SEMEDA to provide semantically integrated access to their databases as well as to collaboratively edit and maintain ontologies and controlled vocabularies. Biologists can use SEMEDA to query the integrated databases in real time without having to know the structure or any technical details of the underlying databases. AVAILABILITY SEMEDA is available at http://www-bm.ipk-gatersleben.de/semeda/. Database providers who intend to grant access to their databases via SEMEDA are encouraged to contact the authors.

[1]  C. J. Date A formal definition of the relational model , 1982, SGMD.

[2]  Jungyun Seo,et al.  Classifying schematic and data heterogeneity in multidatabase systems , 1991, Computer.

[3]  E. Webb Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. , 1992 .

[4]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[5]  Peter D. Karp,et al.  A Strategy for Database Interoperation , 1995, J. Comput. Biol..

[6]  Peter Buneman,et al.  Challenges in Integrating Biological Data Sources , 1995, J. Comput. Biol..

[7]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[8]  Terry Gaasterland,et al.  The metabolic pathway collection from EMP: the enzymes and metabolic pathways database , 1996, Nucleic Acids Res..

[9]  Steffen Schulze-Kremer,et al.  Adding Semantics to Genome Databases: Towards an Ontology for Molecular Biology , 1997, ISMB.

[10]  N Williams,et al.  How to Get Databases Talking the Same Language , 1997, Science.

[11]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[12]  Carole A. Goble,et al.  TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources , 1998, ISMB.

[13]  Werner Ceusters,et al.  Reconciling users' needs and formal requirements: issues in developing a reusable ontology for medicine , 1998, IEEE Transactions on Information Technology in Biomedicine.

[14]  M. Kanehisa,et al.  DBGET/LinkDB: an integrated database retrieval system. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[15]  Anthony Kosky,et al.  Seamless Integration of Biological Applications within a Database Framework , 1999, ISMB.

[16]  Padmini Srinivasan,et al.  Exploring the UMLS: a rough sets based theoretical framework , 1999, AMIA.

[17]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[18]  Matthias Lange,et al.  Logical and semantic database integration , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[19]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[20]  Carole A. Goble,et al.  A classification of tasks in bioinformatics , 2001, Bioinform..

[21]  Louiqa Raschid,et al.  Optimized seamless integration of biomolecular data , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[22]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[23]  Chenna Ramu,et al.  SIR: a simple indexing and retrieval system for biological flat file databases , 2001, Bioinform..

[24]  Steffen Schulze-Kremer,et al.  The Semantic Metadatabase (SEMEDA): Ontology Based Integration of Federated Molecular Biological Data Sources , 2001, Silico Biol..

[25]  Robert E. Buntrock,et al.  Chemical Registries-in the Fourth Decade of Service , 2001, J. Chem. Inf. Comput. Sci..

[26]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[27]  Gerd Stumme,et al.  FCA-merge: a bottom-up approach for merging ontologies , 2001 .

[28]  Josef Ingenerf,et al.  Standardized terminological services enabling semantic interoperability between distributed and heterogeneous systems , 2001, Int. J. Medical Informatics.

[29]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: 2002 update , 2002, Nucleic Acids Res..

[30]  Frank van Harmelen,et al.  Reviewing the design of DAML+OIL: an ontology language for the semantic web , 2002, AAAI/IAAI.

[31]  Carole A. Goble,et al.  Query processing with description logic ontologies over object-wrapped databases , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[32]  Uwe Scholz,et al.  BioDataServer: A SQL-based service for the online integration of life science data , 2002, Silico Biol..

[33]  Peter D. Karp,et al.  The MetaCyc Database , 2002, Nucleic Acids Res..

[34]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..

[35]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[36]  Model-based Mediation with Domain Maps , 2002 .

[37]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: 2003 update , 2003, Nucleic Acids Res..