Flexible Integration of Molecular-Biological Annotation Data: The GenMapper Approach

Molecular-biological annotation data is continuously being collected, curated and made accessible in numerous public data sources. Integration of this data is a major challenge in bioinformatics. We present the GenMapper system that physically integrates heterogeneous annotation data in a flexible way and supports large-scale analysis on the integrated data. It uses a generic data model to uniformly represent different kinds of annotations originating from different data sources. Existing associations between objects, which represent valuable biological knowledge, are explicitly utilized to drive data integration and combine annotation knowledge from different sources. To serve specific analysis needs, powerful operators are provided to derive tailored annotation views from the generic data representation. GenMapper is operational and has been successfully used for large-scale functional profiling of genes. Interactive access is provided under http://www.izbi.de.

[1]  Limsoon Wong,et al.  Kleisli: its exchange format, supporting tools, and an application in protein interaction extraction , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[2]  Carole A. Goble,et al.  Transparent access to multiple bioinformatics information sources , 2001, IBM Syst. J..

[3]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: 2002 update , 2002, Nucleic Acids Res..

[4]  Carole A. Goble,et al.  Conceptual modelling of genomic information , 2000, Bioinform..

[5]  S. Pääbo,et al.  Intra- and Interspecific Variation in Primate Gene Expression Patterns , 2002, Science.

[6]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[7]  Renée J. Miller,et al.  Mapping data in peer-to-peer systems: semantics and algorithmic issues , 2003, SIGMOD '03.

[8]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[9]  Otto Ritter The Integrated Genomic Database (IGD) , 1994 .

[10]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: 2003 update , 2003, Nucleic Acids Res..

[11]  Renée J. Miller,et al.  Data mapping in peer-to-peer systems: Semantics and algorithmic issues , 2003, SIGMOD 2003.

[12]  Peter M. D. Gray,et al.  A schema-based approach to building a bioinformatics database federation , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[13]  M. Kanehisa,et al.  DBGET/LinkDB: an integrated database retrieval system. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[14]  Perry L. Miller,et al.  Application of Information Technology: Organization of Heterogeneous Scientific Data Using the EAV/CR Representation , 1999, J. Am. Medical Informatics Assoc..

[15]  Terence Critchlow,et al.  DataFoundry: information management for scientific data , 2000, IEEE Transactions on Information Technology in Biomedicine.

[16]  Rakesh Agrawal,et al.  Storage and Querying of E-Commerce Data , 2001, VLDB.

[17]  Philip A. Bernstein,et al.  The Microsoft Repository , 1997, VLDB.

[18]  Sean R. Eddy,et al.  The Distributed Annotation System , 2001, BMC Bioinformatics.