All That Glitters Is Gold - An Attack Scheme on Gold Questions in Crowdsourcing

One of the most popular quality assurance mechanisms in paid micro-task crowdsourcing is based on gold questions: the use of a small set of tasks of which the requester knows the correct answer and, thus, is able to directly assess crowd work quality. In this paper, we show that such mechanism is prone to an attack carried out by a group of colluding crowd workers that is easy to implement and deploy: the inherent size limit of the gold set can be exploited by building an inferential system to detect which parts of the job are more likely to be gold questions. The described attack is robust to various forms of randomisation and programmatic generation of gold questions. We present the architecture of the proposed system, composed of a browser plug-in and an external server used to share information, and briefly introduce its potential evolution to a decentralised implementation. We implement and experimentally validate the gold detection system, using real-world data from a popular crowdsourcing platform. Finally, we discuss the economic and sociological implications of this kind of attack.

[1]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[2]  Wai-Tat Fu,et al.  Enhancing reliability using peer consistency evaluation in human computation , 2013, CSCW '13.

[3]  Matthias Thimm,et al.  Crowd Work CV: Recognition for Micro Work , 2014, SocInfo Workshops.

[4]  Dan Cosley,et al.  Taking a HIT: Designing around Rejection, Mistrust, Risk, and Workers' Experiences in Amazon Mechanical Turk , 2016, CHI.

[5]  Gary T. Marx,et al.  A Tack in the Shoe: Neutralizing and Resisting the New Surveillance , 2003 .

[6]  Scott R. Klemmer,et al.  Shepherding the crowd: managing and providing feedback to crowd workers , 2011, CHI Extended Abstracts.

[7]  Jason L. Snyder E-Mail Privacy in the Workplace , 2010 .

[8]  John Le,et al.  Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution , 2010 .

[9]  Brian Cooper,et al.  Electronic monitoring and surveillance in the workplace: The effects on trust in management, and the moderating role of occupational type , 2015 .

[10]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[11]  Boualem Benatallah,et al.  Quality Control in Crowdsourcing , 2018, ACM Comput. Surv..

[12]  B. Stahl FORENSIC COMPUTING IN THE WORKPLACE: HEGEMONY, IDEOLOGY, AND THE PERFECT PANOPTICON? , 2008 .

[13]  M. Six Silberman,et al.  Turkopticon: interrupting worker invisibility in amazon mechanical turk , 2013, CHI.

[14]  M Damashek,et al.  Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.

[15]  Jessica J. Kulynych Performing Politics: Foucault, Habermas, and Postmodern Participation , 1997, Polity.

[16]  Lukas Biewald,et al.  Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing , 2011, Human Computation.

[17]  Gianluca Demartini,et al.  Mechanical Cheat: Spamming Schemes and Adversarial Techniques on Crowdsourcing Platforms , 2012, CrowdSearch.

[18]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[19]  Stefan Dietze,et al.  Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys , 2015, CHI.

[20]  Mihaela Vorvoreanu,et al.  Examining Electronic Surveillance In The Workplace: A Review Of Theoretical Perspectives And Research Findings , 2000 .

[21]  Marcello Federico,et al.  Getting Expert Quality from the Crowd for Machine Translation Evaluation , 2011, MTSUMMIT.

[22]  Sabine Buchholz,et al.  Crowdsourcing Preference Tests, and How to Detect Cheating , 2011, INTERSPEECH.

[23]  Michael S. Bernstein,et al.  We Are Dynamo: Overcoming Stalling and Friction in Collective Action for Crowd Workers , 2015, CHI.

[24]  Caitlin Sadowski SimHash : Hash-based Similarity Detection , 2007 .

[25]  Benjamin E. Lauderdale,et al.  Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data , 2016, American Political Science Review.

[26]  Carl H. Botan Communication work and electronic surveillance: A model for predicting panoptic effects , 1996 .

[27]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[28]  Jaime Teevan,et al.  Information extraction and manipulation threats in crowd-powered systems , 2014, CSCW.

[29]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[30]  E. Thacker,et al.  THE EXPLOIT: A THEORY OF NETWORKS , 2007, Soundings: An Interdisciplinary Journal.

[31]  Scott C. D'Urso,et al.  Who’s Watching Us at Work? Toward a Structural–Perceptual Model of Electronic Monitoring and Surveillance in Organizations , 2006 .

[32]  Jiayu Tang,et al.  Examining the Limits of Crowdsourcing for Relevance Assessment , 2013, IEEE Internet Computing.

[33]  Gianluca Demartini,et al.  Scaling-Up the Crowd: Micro-Task Pricing Schemes for Worker Retention and Latency Improvement , 2014, HCOMP.