New Insights from Coarse Word Sense Disambiguation in the Crowd

We use crowdsourcing to disambiguate 1000 words from among coarse-grained senses, the most extensive investigation to date. Ten unique participants disambiguate each example, and, using regression, we find surprising features which drive differential WSD accuracy: (a) the number of rephrasings within a sense definition is associated with higher accuracy; (b) as word frequency increases, accuracy decreases even if the number of senses is kept constant; and (c) spending more time is associated with a decrease in accuracy. We also observe that all participants are about equal in ability, practice (without feedback) does not seem to lead to improvement, and that having many participants label the same example provides a partial substitute for more expensive annotation.

[1]  Dana Chandler,et al.  Preventing Satisficing in Online Surveys: A "Kapcha" to Ensure Higher Quality Data , 2010 .

[2]  Christiane Fellbaum,et al.  Analysis of a Hand-Tagging Task , 1997, Workshop On Tagging Text With Lexical Semantics: Why, What, And How?.

[3]  Rada Mihalcea,et al.  Amazon Mechanical Turk for Subjectivity Word Sense Disambiguation , 2010, Mturk@HLT-NAACL.

[4]  Maxine Eskénazi,et al.  Clustering dictionary definitions using Amazon Mechanical Turk , 2010, Mturk@HLT-NAACL.

[5]  Klaus Krippendorff,et al.  Estimating the Reliability, Systematic Error and Random Error of Interval Data , 1970 .

[6]  Rada Mihalcea,et al.  Exploiting Agreement and Disagreement of Human Annotators for Word Sense Disambiguation , 2003 .

[7]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[8]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[9]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[10]  Dana Chandler,et al.  Breaking Monotony with Meaning: Motivation in Crowdsourcing Markets , 2012, ArXiv.

[11]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[12]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[13]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[14]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[15]  Nancy Ide,et al.  Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations , 2012, Lang. Resour. Evaluation.

[16]  Erik T. Mueller,et al.  Open Mind Common Sense: Knowledge Acquisition from the General Public , 2002, OTM.

[17]  J. Krosnick Response strategies for coping with the cognitive demands of attitude measures in surveys , 1991 .

[18]  Martha Palmer,et al.  Number or Nuance: Which Factors Restrict Reliable Word Sense Annotation? , 2010, LREC.