Is it worth the effort? Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of these resources time-consuming and costly. One attempt to speed up the annotation process is to use supervised machine-learning systems to automatically assign (possibly erroneous) labels to the data and ask human annotators to correct them where necessary. However, it is not clear to what extent these automatic pre-annotations are successful in reducing human annotation effort, and what impact they have on the quality of the resulting resource. In this article, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. We investigate the impact of automatic pre-annotation of differing quality on annotation time, consistency and accuracy. While we found no conclusive evidence that it can speed up human annotation, we found that automatic pre-annotation does increase its overall quality.

[1]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[2]  Katrin Erk,et al.  SALTO - A Versatile Multi-Level Annotation Tool , 2006, LREC.

[3]  Jason Baldridge,et al.  Active Learning and the Total Cost of Annotation , 2004, EMNLP.

[4]  Kuzman Ganchev,et al.  Semi-Automated Named Entity Annotation , 2007, LAW@ACL.

[5]  Claire Cardie,et al.  Bootstrapping Coreference Classifiers with Multiple Machine Learning Algorithms , 2003, EMNLP.

[6]  Walt Detmar Meurers,et al.  On the use of electronic corpora for theoretical linguistics , 2005, Lingua.

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Udo Kruschwitz,et al.  Linguistic) Science Through Web Collaboration in the ANAWIKI project , 2009 .

[9]  Martha Palmer,et al.  Facilitating Treebank Annotation Using a Statistical Parser , 2001, HLT.

[10]  Michael Strube,et al.  Applying Co-Training to Reference Resolution , 2002, ACL.

[11]  Josef Ruppenhofer,et al.  Framenet in Action: The Case of Attaching , 2003 .

[12]  Josef Ruppenhofer,et al.  Bringing Active Learning to Life , 2010, COLING.

[13]  Monojit Choudhury,et al.  Complex Linguistic Annotation – No Easy Way Out! A Case from Bangla and Hindi POS Labeling Tasks , 2009, Linguistic Annotation Workshop.

[14]  Wen-Lian Hsu,et al.  A Semi-Automatic Method for Annotating a Biomedical Proposition Bank , 2006 .

[15]  Thorsten Brants,et al.  Interactive Corpus Annotation , 2000, LREC.

[16]  Nianwen Xue,et al.  Building a Large-Scale Annotated Chinese Corpus , 2002, COLING.