IDPredictor: predict database links in biomedical database

Summary Knowledge found in biomedical databases, in particular in Web information systems, is a major bioinformatics resource. In general, this biological knowledge is worldwide represented in a network of databases. These data is spread among thousands of databases, which overlap in content, but differ substantially with respect to content detail, interface, formats and data structure. To support a functional annotation of lab data, such as protein sequences, metabolites or DNA sequences as well as a semi-automated data exploration in information retrieval environments, an integrated view to databases is essential. Search engines have the potential of assisting in data retrieval from these structured sources, but fall short of providing a comprehensive knowledge excerpt out of the interlinked databases. A prerequisit of supporting the concept of an integrated data view is to acquire insights into cross-references among database entities. This issue is being hampered by the fact, that only a fraction of all possible cross-references are explicitely tagged in the particular biomedical informations systems. In this work, we investigate to what extend an automated construction of an integrated data network is possible. We propose a method that predicts and extracts cross-references from multiple life science databases and possible referenced data targets. We study the retrieval quality of our method and report on first, promising results. The method is implemented as the tool IDPredictor, which is published under the DOI 10.5447/IPK/2012/4 and is freely available using the URL: http://dx.doi.org/10.5447/IPK/2012/4.

[1]  Susan B. Davidson,et al.  BioGuideSRS: querying multiple sources with a user-centric perspective , 2007, Bioinform..

[2]  Zoé Lacroix,et al.  Bioinformatics: Managing Scientific Data , 2013 .

[3]  Peter Sollich,et al.  Theory of Neural Information Processing Systems , 2005 .

[4]  G. Schuler,et al.  Entrez: molecular biology database and retrieval system. , 1996, Methods in enzymology.

[5]  Maria-Esther Vidal,et al.  BioNavigation: using ontologies to express meaningful navigational queries over biological resources , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[6]  Carole A. Goble,et al.  A classification of tasks in bioinformatics , 2001, Bioinform..

[7]  L. Stein Creating a bioinformatics nation , 2002, Nature.

[8]  Matthias Lange,et al.  Extracting cross references from life science databases for search result ranking , 2011, CIKM '11.

[9]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[10]  Michael Y. Galperin,et al.  The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection , 2011, Nucleic Acids Res..

[11]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[12]  Zhiyong Lu,et al.  PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.

[13]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[14]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[15]  Marti A. Hearst,et al.  Evidence for Showing Gene/Protein Name Suggestions in Bioscience Literature Search Interfaces , 2007, Pacific Symposium on Biocomputing.

[16]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[17]  Mounir Errami,et al.  eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications , 2007, Nucleic Acids Res..

[18]  Ming Yi,et al.  bioDBnet: the biological database network , 2009, Bioinform..

[19]  Uwe Scholz,et al.  The LAILAPS Search Engine: Relevance Ranking in Life Science Databases , 2010, J. Integr. Bioinform..