tagtog : Interactive Human and Machine Annotation of Gene Mentions in PLOS Full-Text Articles

We present the tagtog system, a web-based annotation framework that can be used to mark-up biological entities and concepts in full-text articles. tagtog leverages user manual annotations in combination with automatic machine-learned annotations to provide accurate gene symbol and name identification in biomedical literature. For this submission we present, in collaboration with the FlyBase database curation team at Cambridge University, the task of identifying and extracting mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora, and assess gene recognition performance. Finally, we would like to extend an invitation for Biocurators at the BioCreative IV Track 5 -User Interactive Task (IAT) to come and find us to try tagtog themselves. Introduction tagtog (http://tagtog.net) is a web-based framework for the annotation of named entities. A user creates a project, defines a named-entity recognition task (such as gene annotation), and uploads a set of text documents to the system. Each document is then displayed in a web editor where the user can add, delete, or correct the information relevant to the annotation task. An example of the user interface is shown in Figure 1. The user can add the annotation of an entity by selecting the corresponding word(s) and remove it by clicking another time on the selection. During the