OnTheFly 2.0: A tool for automatic annotation of files and biological information extraction

Retrieving all of the necessary information from databases about bioentities mentioned in an article is not a trivial or an easy task. Following the daily literature about a specific biological topic and collecting all the necessary information about the bioentities mentioned in the literature manually is tedious and time consuming. OnTheFly 2.0 is a web application mainly designed for non-computer experts which aims to automate data collection and knowledge extraction from biological literature in a user friendly and efficient way. OnTheFly 2.0 is able to extract bioentities from individual articles such as text, Microsoft Word, Excel and PDF files. With a simple drag-and-drop motion, the text of a document is extensively parsed for bioentities such as protein/gene names and chemical compound names. Utilizing high quality data integration platforms, OnTheFly allows the generation of informative summaries, interaction networks and at-a-glance popup windows containing knowledge related to the bioentities found in documents. OnTheFly 2.0 provides a concise application to automate the extraction of bioentities hidden in various documents and is offered as a web based application. It can be found at: http://onthefly.embl.de, http://onthefly.med.uoc.gr or http://onthefly.hcmr.gr.

[1]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[2]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.

[3]  Damian Szklarczyk,et al.  STITCH 3: zooming in on protein–chemical interactions , 2011, Nucleic Acids Res..

[4]  Christian von Mering,et al.  STITCH: interaction networks of chemicals and proteins , 2007, Nucleic Acids Res..

[5]  Oscar Naim,et al.  Word add-in for ontology recognition: semantic enrichment of scientific literature , 2010, BMC Bioinformatics.

[6]  Reinhard Schneider,et al.  OnTheFly: a tool for automated document-based text annotation, data linking and network generation , 2009, Bioinform..

[7]  Damian Szklarczyk,et al.  STITCH 2: an interaction network database for small molecules and proteins , 2009, Nucleic Acids Res..

[8]  Seán I O'Donoghue,et al.  Reflect: augmented browsing for the life scientist , 2009, Nature Biotechnology.

[9]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[10]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[11]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[12]  A. Valencia,et al.  Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge , 2008, Genome Biology.

[13]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[14]  Christian Blaschke,et al.  Status of text-mining techniques applied to biomedical text. , 2006, Drug discovery today.

[15]  Christos A. Ouzounis,et al.  BioTextQuest: a web-based biomedical text mining suite for concept discovery , 2011, Bioinform..

[16]  Dietrich Rebholz-Schuhmann,et al.  Measuring prediction capacity of individual verbs for the identification of protein interactions , 2010, J. Biomed. Informatics.

[17]  Zhiyong Lu,et al.  PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.

[18]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.