Crowdsourcing and Aggregating Nested Markable Annotations

One of the key steps in language resource creation is the identification of the text segments to be annotated, or markables, which depending on the task may vary from nominal chunks for named entity resolution to (potentially nested) noun phrases in coreference resolution (or mentions) to larger text segments in text segmentation. Markable identification is typically carried out semi-automatically, by running a markable identifier and correcting its output by hand—which is increasingly done via annotators recruited through crowdsourcing and aggregating their responses. In this paper, we present a method for identifying markables for coreference annotation that combines high-performance automatic markable detectors with checking with a Game-With-A-Purpose (GWAP) and aggregation using a Bayesian annotation model. The method was evaluated both on news data and data from a variety of other genres and results in an improvement on F1 of mention boundaries of over seven percentage points when compared with a state-of-the-art, domain-independent automatic mention detector, and almost three points over an in-domain mention detector. One of the key contributions of our proposal is its applicability to the case in which markables are nested, as is the case with coreference markables; but the GWAP and several of the proposed markable detectors are task and language-independent and are thus applicable to a variety of other annotation scenarios.

[1]  Koby Crammer,et al.  Sequence Learning from Data with Multiple Labels , 2009 .

[2]  Johan Bos,et al.  Developing a large semantically annotated corpus , 2012, LREC.

[3]  Roberto Navigli,et al.  It’s All Fun and Games until Someone Annotates: Video Games with a Purpose for Linguistic Annotation , 2014, TACL.

[4]  Udo Kruschwitz,et al.  Comparing Bayesian Models of Annotation , 2018, TACL.

[5]  Dan Klein,et al.  Mention Detection: Heuristics for the OntoNotes annotations , 2011, CoNLL Shared Task.

[6]  Mathieu Lafourcade,et al.  Games with a Purpose (Gwaps): Lafourcade/Games with a Purpose (Gwaps) , 2015 .

[7]  Udo Kruschwitz,et al.  Experiment-Driven Development of a GWAP for Marking Segments in Text , 2017, CHI PLAY.

[8]  Matthew Lease,et al.  SQUARE: A Benchmark for Research on Computing Crowd Consensus , 2013, HCOMP.

[9]  Xabier Arregi,et al.  Mention detection: First steps in the development of a Basque coreference resolution system , 2012, KONVENS.

[10]  Kalina Bontcheva,et al.  Crowdsourcing Named Entity Recognition and Entity Linking Corpora , 2017 .

[11]  Yannick Versley,et al.  Annotated Corpora and Annotation Tools , 2016, Anaphora Resolution - Algorithms, Resources, and Applications.

[12]  Ying Chen,et al.  Detection of Entity Mentions Occuring in English and Chinese Text , 2005, HLT.

[13]  Ellen Riloff,et al.  The Taming of Reconcile as a Biomedical Coreference Resolver , 2011, BioNLP@ACL.

[14]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[15]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[16]  Eric K. Ringger,et al.  Momresp: A Bayesian Model for Multi-Annotator Document Labeling , 2014, LREC.

[17]  Luke S. Zettlemoyer,et al.  Higher-Order Coreference Resolution with Coarse-to-Fine Inference , 2018, NAACL.

[18]  Udo Kruschwitz,et al.  Phrase detectives: Utilizing collective intelligence for internet-scale language resource creation , 2013, TIIS.

[19]  Joakim Nivre,et al.  A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012, EMNLP.

[20]  Bob Carpenter,et al.  The Benefits of a Model of Annotation , 2013, Transactions of the Association for Computational Linguistics.

[21]  Heeyoung Lee,et al.  Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.

[22]  Udo Kruschwitz,et al.  Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference. , 2016, LREC.

[23]  Bernardete Ribeiro,et al.  Sequence labeling with multiple annotators , 2013, Machine Learning.

[24]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[25]  Dirk Hovy,et al.  Learning Whom to Trust with MACE , 2013, NAACL.

[26]  Sandra Kübler,et al.  UBIU: A Language-Independent System for Coreference Resolution , 2010, *SEMEVAL.

[27]  Ron Artstein,et al.  Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus , 2019, Natural Language Engineering.

[28]  Meliha Yetisgen-Yildiz,et al.  Annotating Large Email Datasets for Named Entity Recognition with Mechanical Turk , 2010, Mturk@HLT-NAACL.

[29]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[30]  Johan Bos,et al.  The Groningen Meaning Bank , 2013, JSSP.

[31]  Laura A. Dabbish,et al.  Designing games with a purpose , 2008, CACM.

[32]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[33]  Claire Cardie,et al.  Conundrums in Noun Phrase Coreference Resolution: Making Sense of the State-of-the-Art , 2009, ACL.

[34]  Udo Kruschwitz,et al.  A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric Interpretation , 2019, NAACL.

[35]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[36]  N. Sadat Shami,et al.  Experiments on Motivational Feedback for Crowdsourced Workers , 2013, ICWSM.

[37]  Alexander Maedche,et al.  Gamified crowdsourcing: Conceptualization, literature review, and future agenda , 2017, Int. J. Hum. Comput. Stud..

[38]  Dan Roth,et al.  A Joint Framework for Coreference Resolution and Mention Head Detection , 2015, CoNLL.

[39]  Matthew Lease,et al.  Aggregating and Predicting Sequence Labels from Crowd Annotations , 2017, ACL.

[40]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[41]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[42]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.