Slate - A Tool for Creating and Maintaining Annotated Corpora

Recent research trends of the last five years show that richly annotated corpora inspire novel research. These richly annotated corpora are indispensable for progressing research, but also more difficult to manage and maintain due to increasing complexity – what is needed is a way to manage the annotation project in its entirety. However, annotation project management has received little attention, with tools predominately focusing on single document annotation. Therefore, we define a list of corpus creation and management needs for annotation systems, and then introduce our multi-purpose annotation and management system Slate to address these needs through use of a case study, showing how project management is essential to creating good corpora.

[1]  Dain Kaplan,et al.  Automatic Extraction of Citation Contexts for Research Paper Summarization: A Coreference-chain based Approach , 2009 .

[2]  Jean Carletta,et al.  The NITE XML Toolkit: Data Model and Query Language , 2005, Lang. Resour. Evaluation.

[3]  Mark Liberman,et al.  ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation , 2000, LREC.

[4]  Raquel Hervás,et al.  EmoTales: creating a corpus of folk tales with emotional annotations , 2012, Lang. Resour. Evaluation.

[5]  Oliver Schreer,et al.  ELAN as Flexible Annotation Framework for Sound and Image Processing Detectors , 2010, LREC.

[6]  Martha Palmer,et al.  Propbank Instance Annotation Guidelines Using a Dedicated Editor, Jubilee , 2010, LREC.

[7]  Dain Kaplan,et al.  Annotation Process Management Revisited , 2010, LREC.

[8]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[9]  Stefanie Dipper,et al.  Simple Annotation Tools for Complex Annotation Tasks : an Evaluation , 2004 .

[10]  Nils Diewald,et al.  Web-based Annotation of Anaphoric Relations and Lexical Chains , 2007, LAW@ACL.

[11]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[12]  Yuji Matsumoto,et al.  Annotating a Japanese Text Corpus with Predicate-Argument and Coreference Relations , 2007, LAW@ACL.

[13]  Martha Palmer,et al.  Propbank Frameset Annotation Guidelines Using a Dedicated Editor, Cornerstone , 2010, LREC.

[14]  Yann Mathet,et al.  La plate-forme Glozz : environnement d’annotation et d’exploration de corpus , 2009, JEPTALNRECITAL.

[15]  Thomas S. Morton,et al.  WordFreak: An Open Tool for Linguistic Annotation , 2003, HLT-NAACL.

[16]  Michael Strube,et al.  MMAX: A Tool for the Annotation of Multi-modal Corpora , 2001, IJCAI 2001.

[17]  Johan Bos,et al.  An annotated corpus for the analysis of VP ellipsis , 2011, Lang. Resour. Evaluation.

[18]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[19]  Christoph Müller,et al.  Multi-level annotation of linguistic data with MMAX 2 , 2006 .

[20]  Kentaro Inui,et al.  Multiple Purpose Annotation using SLAT — Segment and Link-based Annotation Tool — , 2008 .

[21]  Young-In Song,et al.  A segment-based annotation tool for Korean treebanks with minimal human intervention , 2006, Lang. Resour. Evaluation.

[22]  Rashmi Prasad,et al.  The Penn Discourse Treebank , 2004, LREC.

[23]  Noriko Tomuro,et al.  Djangology: A Light-weight Web-based Tool for Distributed Collaborative Text Annotation , 2010, LREC.

[24]  Christoph Draxler,et al.  WebTranscribe - An Extensible Web-Based Speech Annotation Framework , 2005, TSD.

[25]  Constantin Orasan,et al.  PALinkA: A highly customisable tool for discourse annotation , 2003, SIGDIAL Workshop.

[26]  Takenobu Tokunaga,et al.  A new approach to syntactic annotation , 2006, LREC.