Predicting word sense annotation agreement

High agreement is a common objective when annotating data for word senses. However, a number of factors make perfect agreement impossible, e.g. the limitations of sense inventories, the difficulty of the examples or the interpretation preferences of the annotators. Estimating potential agreement is thus a relevant task to supplement the evaluation of sense annotations. In this article we propose two methods to predict agreement on wordannotation instances. We experiment with a continuous representation and a threeway discretization of observed agreement. In spite of the difficulty of the task, we find that different levels of agreement can be identified—in particular, low-agreement examples are easier to identify.

[1]  Bob Carpenter,et al.  The Benefits of a Model of Annotation , 2013, Transactions of the Association for Computational Linguistics.

[2]  Barbara Plank,et al.  Using Frame Semantics for Knowledge Extraction from Twitter , 2015, AAAI.

[3]  Eneko Agirre,et al.  A methodology for the joint development of the Basque WordNet and Semcor , 2006, LREC.

[4]  D. Id,et al.  Evaluating sense disambiguation across diverse parameter spaces , 2002 .

[5]  Dirk Hovy,et al.  Linguistically debatable or just plain wrong? , 2014, ACL.

[6]  Nancy Ide,et al.  Word Sense Annotation of Polysemous Words by Multiple Annotators , 2010, LREC.

[7]  Silvia Bernardini,et al.  Introducing and evaluating ukWaC , a very large web-derived corpus of English , 2008 .

[8]  Eneko Agirre,et al.  A Methodology for Word Sense Disambiguation at 90% based on large-scale CrowdSourcing , 2015, *SEMEVAL.

[9]  David Jurgens,et al.  Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels , 2013, NAACL.

[10]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[11]  Igor Leturia Evaluating Different Methods for Automatically Collecting Large General Corpora for Basque from the Web , 2012, COLING.

[12]  Sussi Olsen,et al.  Supersense tagging for Danish , 2015, NODALIDA.

[13]  Eneko Agirre,et al.  Crowdsourced Word Sense Annotations and Difficult Words and Examples , 2015, IWCS.

[14]  Klaus Krippendorff,et al.  Agreement and Information in the Reliability of Coding , 2011 .

[15]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[16]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[17]  Héctor Martínez Alonso Annotation of regular polysemy: an empirical assessment of the underspecified sense , 2013 .

[18]  Eric P. Xing,et al.  Turbo Parsers: Dependency Parsing by Approximate Variational Inference , 2010, EMNLP.

[19]  Lucia Specia,et al.  Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation , 2013, ACL.

[20]  David Jurgens,et al.  An analysis of ambiguity in word sense annotations , 2014, LREC.

[21]  Christiane Fellbaum,et al.  The MASC Word Sense Corpus , 2012, LREC.

[22]  Nancy Ide,et al.  Making Sense of Word Sense Variation , 2009, SEW@NAACL-HLT.

[23]  Dirk Hovy,et al.  More or less supervised supersense tagging of Twitter , 2014, *SEMEVAL.