Quick Pad Tagger : An Efficient Graphical User Interface for Building Annotated Corpora with Multiple Annotation Layers

More and more domain specific applications in the internet make use of Natural Language Processing (NLP) tools (e. g. Information Extraction systems). The output quality of these applications relies on the output quality of the used NLP tools. Often, the quality can be increased by annotating a domain specific corpus. However, annotating a corpus is a time consuming and exhaustive task. To reduce the annotation time we present a custom Graphical User Interface for different annotation layers.

[1]  Karin M. Verspoor,et al.  BioLemmatizer: a lemmatization tool for morphological processing of biomedical text , 2012, J. Biomed. Semant..

[2]  Rudolf Mathar,et al.  Part-Of-Speech Tagging for Social Media Texts , 2013, GSCL.

[3]  Craig S. Miller,et al.  Comparison of Mouse and Keyboard Efficiency , 2010 .

[4]  Kenton O'Hara,et al.  Gamification. using game-design elements in non-gaming contexts , 2011, CHI Extended Abstracts.

[5]  Iryna Gurevych,et al.  Automatic Annotation Suggestions and Custom Annotation Layers in WebAnno , 2014, ACL.

[6]  Samson W. Tu,et al.  Protégé-2000: An Open-Source Ontology-Development and Knowledge-Acquisition Environment: AMIA 2003 Open Source Expo , 2003, AMIA.

[7]  Philip V. Ogren,et al.  Knowtator: A Protégé plug-in for annotated corpus construction , 2006, NAACL.

[8]  Martin Hofmann-Apitius,et al.  Detection of IUPAC and IUPAC-like chemical names , 2008, ISMB.

[9]  Ulf Leser,et al.  WBI-NER: The impact of domain-specific features on the performance of identifying and classifying mentions of drugs , 2013, *SEMEVAL.

[10]  Katrin Erk,et al.  SALTO - A Versatile Multi-Level Annotation Tool , 2006, LREC.

[11]  Wolfgang Lezius,et al.  An XML-based Representation Format for Syntactically Annotated Corpora , 2000, LREC.

[12]  Iryna Gurevych,et al.  WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations , 2013, ACL.

[13]  Jens H. Weber,et al.  Building a biomedical tokenizer using the token lattice design pattern and the adapted Viterbi algorithm , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[14]  Louise Deléger,et al.  Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements , 2013, J. Am. Medical Informatics Assoc..

[15]  David M. Lane,et al.  Hidden Costs of Graphical User Interfaces: Failure to Make the Transition from Menus and Icon Toolbars to Keyboard Shortcuts , 2005, Int. J. Hum. Comput. Interact..

[16]  Benoît Sagot,et al.  Influence of Pre-Annotation on POS-Tagged Corpus Development , 2010, Linguistic Annotation Workshop.

[17]  Iryna Gurevych,et al.  Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary , 2008, LREC.

[18]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[19]  Erhard W. Hinrichs,et al.  A Corpus Representation Format for Linguistic Web Services: The D-SPIN Text Corpus Format and its Relationship with ISO Standards , 2010, LREC.

[20]  Peter J. Haug,et al.  Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation , 2013, J. Am. Medical Informatics Assoc..