Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels

Word sense disambiguation aims to identify which meaning of a word is present in a given usage. Gathering word sense annotations is a laborious and difficult task. Several methods have been proposed to gather sense annotations using large numbers of untrained annotators, with mixed results. We propose three new annotation methodologies for gathering word senses where untrained annotators are allowed to use multiple labels and weight the senses. Our findings show that given the appropriate annotation task, untrained workers can obtain at least as high agreement as annotators in a controlled setting, and in aggregate generate equally as good of a sense labeling.

[1]  Nancy Ide,et al.  Word Sense Annotation of Polysemous Words by Multiple Annotators , 2010, LREC.

[2]  Silvie Cinková,et al.  Managing Uncertainty in Semantic Tagging , 2012, EACL.

[3]  Adam Kilgarriff,et al.  English Lexical Sample Task Description , 2001, *SEMEVAL.

[4]  Adam Kilgarriff,et al.  95% Replicability for Manual Word Sense Tagging , 1999, EACL.

[5]  Christian Biemann,et al.  Turk Bootstrap Word Sense Inventory 2.0: A Large-Scale Resource for Lexical Substitution , 2012, LREC.

[6]  Rebecca Green,et al.  Lexical knowledge and human disagreement on a WSD task , 2004, Comput. Speech Lang..

[7]  Olga Babko-Malaya,et al.  Different Sense Granularities for Different Applications , 2004, HLT-NAACL 2004.

[8]  Christiane Fellbaum,et al.  Performance And Confidence In A Semantic Annotation Task , 1998 .

[9]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[10]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[11]  Anna Rumshisky,et al.  Polysemy in Verbs: Systematic Relations between Senses and their Effect on Annotation , 2008, COLING 2008.

[12]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[13]  B. Orme MaxDiff Analysis : Simple Counting , Individual-Level Logit , and HB , 2009 .

[14]  Nancy Ide,et al.  Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations , 2012, Lang. Resour. Evaluation.

[15]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[16]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[17]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[18]  Martha Palmer,et al.  The English all-words task , 2004, SENSEVAL@ACL.

[19]  Jean Véronis,et al.  A study of polysemy judgements and inter-annotator agreement , 1999 .

[20]  M. A. R T H A P A L,et al.  Making fine-grained and coarse-grained sense distinctions , both manually and automatically , 2005 .

[21]  Gjergji Kasneci,et al.  Crowd IQ: measuring the intelligence of crowdsourcing platforms , 2012, WebSci '12.

[22]  Adam Kilgarriff,et al.  Framework and Results for English SENSEVAL , 2000, Comput. Humanit..

[23]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[24]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[25]  Katrin Erk,et al.  The SALSA Corpus: a German Corpus Resource for Lexical Semantics , 2006, LREC.

[26]  Chris Biemann,et al.  Crowdsourcing WordNet , 2009 .

[27]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[28]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[29]  Martha Palmer,et al.  Number or Nuance: Which Factors Restrict Reliable Word Sense Annotation? , 2010, LREC.

[30]  Christiane Fellbaum,et al.  The MASC Word Sense Corpus , 2012, LREC.

[31]  Nancy Ide,et al.  Making Sense of Word Sense Variation , 2009, SEW@NAACL-HLT.

[32]  Nizar Habash,et al.  Inter-annotator Agreement on a Multilingual Semantic Annotation Task , 2006, LREC.

[33]  Björn Hartmann,et al.  Collaboratively crowdsourcing workflows with turkomatic , 2012, CSCW.

[34]  Diana McCarthy,et al.  Relating WordNet Senses for Word Sense Disambiguation , 2006 .

[35]  Julio Gonzalo,et al.  A Study of Polysemy and Sense Proximity in the Senseval-2 Test Suite , 2002, SENSEVAL.

[36]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[37]  Katrin Erk,et al.  Investigations on Word Senses and Word Usages , 2009, ACL.

[38]  James Pustejovsky,et al.  Word Sense Inventories by Non-Experts , 2012, LREC.

[39]  Michael Vitale,et al.  The Wisdom of Crowds , 2015, Cell.

[40]  Jisup Hong,et al.  How Good is the Crowd at "real" WSD? , 2011, Linguistic Annotation Workshop.

[41]  Chung Yong Lim,et al.  A Case Study on Inter-Annotator Agreement for Word Sense Disambiguation , 1999 .