Extraction and prediction of biomedical database identifier using neural networks towards data network construction

EXECUTIVE SUMMARY Knowledge found in biomedical databases is a major bioinformatics resource. In general, this biological knowledge is represented worldwide in a network of thousands of databases, which overlap in content, but differ substantially with respect to content detail, interface, formats, and data structure. To support a functional annotation of lab data, such as protein sequences, metabolites, or DNA sequences, as well as a semi-automated data exploration in information retrieval environments , an integrated view to databases is essential. A prerequisite of supporting the concept of an integrated data view is to acquire insights into cross-references among database entities.

[1]  Steffen Stadtmüller,et al.  The Role of Ontology Engineering in Linked Data Publishing and Management: An Empirical Study , 2014, Int. J. Semantic Web Inf. Syst..

[2]  Allen Ginsberg Ontological Indeterminacy and the Semantic Web , 2008, Int. J. Semantic Web Inf. Syst..

[3]  Ming Yi,et al.  bioDBnet: the biological database network , 2009, Bioinform..

[4]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[5]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[6]  D. Roos,et al.  Bioinformatics--Trying to Swim in a Sea of Data , 2001, Science.

[7]  Michael Y. Galperin,et al.  The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection , 2011, Nucleic Acids Res..

[8]  L. Stein Creating a bioinformatics nation , 2002, Nature.

[9]  Tatiana A. Tatusova,et al.  Complete genomes in WWW Entrez: data representation and analysis , 1999, Bioinform..

[10]  Vladan Devedzic,et al.  Ontology-Based Automatic Annotation of Learning Content , 2006, Int. J. Semantic Web Inf. Syst..

[11]  Zhiyong Lu,et al.  PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.

[12]  Pedro Mendes,et al.  ISYS: a decentralized, component-based approach to the integration of heterogeneous bioinformatics resources , 2001, Bioinform..

[13]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[14]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[15]  Susan B. Davidson,et al.  BioGuideSRS: querying multiple sources with a user-centric perspective , 2007, Bioinform..

[16]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[17]  Daniel Hanisch,et al.  ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.

[18]  Carole A. Goble,et al.  A classification of tasks in bioinformatics , 2001, Bioinform..

[19]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[20]  Maria-Esther Vidal,et al.  BioNavigation: using ontologies to express meaningful navigational queries over biological resources , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[21]  Data Linkage Graph: computation, querying and knowledge discovery of life science database networks , 2007 .

[22]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[23]  Carole A. Goble,et al.  TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources , 1998, ISMB.

[24]  Marti A. Hearst,et al.  Evidence for Showing Gene/Protein Name Suggestions in Bioscience Literature Search Interfaces , 2007, Pacific Symposium on Biocomputing.

[25]  Mounir Errami,et al.  eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications , 2007, Nucleic Acids Res..

[26]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[27]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[28]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[29]  Óscar Corcho,et al.  A Tool Suite to Enable Web Designers, Web Application Developers and End-users to Handle Semantic Data1 , 2010, Int. J. Semantic Web Inf. Syst..

[30]  Matthias Lange,et al.  The LAILAPS Search Engine: Relevance Ranking in Life Science Databases , 2010, J. Integr. Bioinform..

[31]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[32]  Tobias Galla Theory of Neural Information Processing Systems , 2006 .

[33]  A. Sheth International Journal on Semantic Web & Information Systems , .

[34]  Ina Fourie Semantic-enabled Advancements on the Web: Applications across Industries , 2014 .

[35]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[36]  Matthias Lange,et al.  Extracting cross references from life science databases for search result ranking , 2011, CIKM '11.

[37]  Peer Kröger,et al.  A Computational Biology Database Digest: Data, Data Analysis, and Data Management , 2004, Distributed and Parallel Databases.

[38]  Peter D. Karp,et al.  A Strategy for Database Interoperation , 1995, J. Comput. Biol..

[39]  G. Schuler,et al.  Entrez: molecular biology database and retrieval system. , 1996, Methods in enzymology.