Leveraging Crowdsourcing to Detect Improper Tasks in Crowdsourcing Marketplaces

Controlling the quality of tasks is a major challenge in crowdsourcing marketplaces. Most of the existing crowdsourcing services prohibit requesters from posting illegal or objectionable tasks. Operators in the marketplaces have to monitor the tasks continuously to find such improper tasks; however, it is too expensive to manually investigate each task. In this paper, we present the reports of our trial study on automatic detection of improper tasks to support the monitoring of activities by marketplace operators. We perform experiments using real task data from a commercial crowdsourcing marketplace and show that the classifier trained by the operator judgments achieves high accuracy in detecting improper tasks. In addition, to reduce the annotation costs of the operator and improve the classification accuracy, we consider the use of crowdsourcing for task annotation. We hire a group of crowdsourcing (non-expert) workers to monitor posted tasks, and incorporate their judgments into the training data of the classifier. By applying quality control techniques to handle the variability in worker reliability, our results show that the use of non-expert judgments by crowdsourcing workers in combination with expert judgments improves the accuracy of detecting improper crowdsourcing tasks.

[1]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[2]  Hisashi Kashima,et al.  A Convex Formulation for Learning from Crowds , 2012, AAAI.

[3]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[4]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[5]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[6]  Hisashi Kashima,et al.  Learning from Crowds and Experts , 2012, HCOMP@AAAI.

[7]  Mark W. Schmidt,et al.  Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.

[8]  Matthew Lease,et al.  Semi-Supervised Consensus Labeling for Crowdsourcing , 2011 .

[9]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[10]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[11]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[12]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[13]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[14]  Gabriella Kazai,et al.  In Search of Quality in Crowdsourcing for Search Engine Evaluation , 2011, ECIR.