Experiments with crowdsourced re-annotation of a POS tagging data set

Crowdsourcing lets us collect multiple annotations for an item from several annotators. Typically, these are annotations for non-sequential classification tasks. While there has been some work on crowdsourcing named entity annotations, researchers have largely assumed that syntactic tasks such as part-of-speech (POS) tagging cannot be crowdsourced. This paper shows that workers can actually annotate sequential data almost as well as experts. Further, we show that the models learned from crowdsourced annotations fare as well as the models learned from expert annotations in downstream tasks.

[1]  Shipeng Yu,et al.  Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[2]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[5]  Jacob Andreas,et al.  Corpus Creation for New Genres: A Crowdsourced Approach to PP Attachment , 2010, Mturk@HLT-NAACL.

[6]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[7]  Joakim Nivre,et al.  Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging , 2013, TACL.

[8]  Inc. Alias-i Multilevel Bayesian Models of Categorical Data Annotation , 2008 .

[9]  Dirk Hovy,et al.  Learning Whom to Trust with MACE , 2013, NAACL.

[10]  Dirk Hovy,et al.  When POS data sets don't add up: Combatting sample bias , 2014, LREC.

[11]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[12]  Ben Taskar,et al.  Wiki-ly Supervised Part-of-Speech Tagging , 2012, EMNLP.

[13]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[14]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[15]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[16]  Mark Dredze,et al.  Annotating Named Entities in Twitter Data with Crowdsourcing , 2010, Mturk@HLT-NAACL.

[17]  Kevin Knight,et al.  Minimized Models for Unsupervised Part-of-Speech Tagging , 2009, ACL.

[18]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[19]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[22]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[23]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[24]  Jacob Eisenstein,et al.  What to do about bad language on the internet , 2013, NAACL.

[25]  Koby Crammer,et al.  Sequence Learning from Data with Multiple Labels , 2009 .

[26]  Josef van Genabith,et al.  From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0 , 2011, IJCNLP.

[27]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[28]  Mark W. Schmidt,et al.  Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.

[29]  Bernardete Ribeiro,et al.  Sequence labeling with multiple annotators , 2013, Machine Learning.

[30]  Jacob Emil Mainzer Labeling Parts of Speech Using Untrained Annotators on Mechanical Turk , 2011 .

[31]  Kalina Bontcheva,et al.  Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data , 2013, RANLP.