Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing

Crowdsourcing is an effective tool for scalable data annotation in both research and enterprise contexts. Due to crowdsourcing's open participation model, quality assurance is critical to the success of any project. Present methods rely on EM-style post-processing or manual annotation of large gold standard sets. In this paper we present an automated quality assurance process that is inexpensive and scalable. Our novel process relies on programmatic gold creation to provide targeted training feedback to workers and to prevent common scamming scenarios. We find that it decreases the amount of manual work required to manage crowdsourced labor while improving the overall quality of the results.

[1]  Pietro Perona,et al.  Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[2]  Susan Holmes,et al.  An Interactive Java Statistical Image Segmentation System: GemIdent. , 2009, Journal of statistical software.

[3]  Brian A Vander Schee Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business , 2009 .

[4]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[5]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[6]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[7]  Omar Alonso,et al.  Crowdsourcing for relevance evaluation , 2008, SIGF.

[8]  Scott R. Klemmer,et al.  Shepherding the crowd: managing and providing feedback to crowd workers , 2011, CHI Extended Abstracts.

[9]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[10]  Vili Lehdonvirta,et al.  Knowledge Map of the Virtual Economy , 2011 .

[11]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  John Le,et al.  Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution , 2010 .