An Analysis of the Use of Qualifications on the Amazon Mechanical Turk Online Labor Market

Several human computation systems use crowdsourcing labor markets to recruit workers. However, it is still a challenge to guarantee that the results produced by workers have a high enough quality. This is particularly difficult in markets based on micro-tasks, where the assessment of the quality of the results needs to be done automatically. Pre-selection of suitable workers is a mechanism that can improve the quality of the results achieved. This can be done by considering worker’s personal information, worker’s historical behavior in the system, or through the use of customized qualification tasks. However, little is known about how requesters use these mechanisms in practice. This study advances present knowledge in worker pre-selection by analyzing data collected from the Amazon Mechanical Turk platform, regarding the way requesters use qualifications to this end. Furthermore, the influence of using customized qualification tasks in the quality of the results produced by workers is investigated. Results show that most jobs (93.6%) use some mechanism for the pre-selection of workers. While most workers use standard qualifications provided by the system, the few requesters that submit most of the jobs prefer to use customized ones. Regarding worker behavior, we identified a positive and significant correlation between the propensity of the worker to possess a particular qualification, and both the number of tasks that require this qualification, and the reward offered for the tasks that require the qualification, although this correlation is weak. To assess the impact that the use of customized qualifications has in the quality of the results produced, we have executed experiments with three different types of tasks using both unqualified and qualified workers. The results showed that, generally, qualified workers provide more accurate answers, when compared to unqualified ones.

[1]  Michael S. Bernstein,et al.  The future of crowd work , 2013, CSCW.

[2]  Wouter Joosen,et al.  Middleware for efficient and confidentiality-aware federation of access control policies , 2013, Journal of Internet Services and Applications.

[3]  Aniket Kittur,et al.  Instrumenting the crowd: using implicit behavioral measures to predict task performance , 2011, UIST.

[4]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[5]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[6]  Panagiotis G. Ipeirotis,et al.  Have you done anything like that?: predicting performance using inter-category reputation , 2013, WSDM.

[7]  Schahram Dustdar,et al.  QoS-Based Task Scheduling in Crowdsourcing Environments , 2011, ICSOC.

[8]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[11]  Scott R. Klemmer,et al.  Shepherding the crowd: managing and providing feedback to crowd workers , 2011, CHI Extended Abstracts.

[12]  John Le,et al.  Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution , 2010 .

[13]  James Davis,et al.  Evaluating and improving the usability of Mechanical Turk for low-income workers in India , 2010, ACM DEV '10.

[14]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[15]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[16]  Elisa Bertino,et al.  Quality Control in Crowdsourcing Systems: Issues and Directions , 2013, IEEE Internet Computing.

[17]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[18]  Yu-Wei David Chiu,et al.  Machine Learning with R Cookbook , 2015 .

[19]  A. P. deVries,et al.  How Crowdsourcable is Your Task , 2011 .

[20]  Ben Carterette,et al.  An Analysis of Assessor Behavior in Crowdsourced Preference Judgments , 2010 .

[21]  Qi Su,et al.  Internet-scale collection of human-reviewed data , 2007, WWW '07.

[22]  Martin Schader,et al.  Worker Perception of Quality Assurance Mechanisms in Crowdsourcing and Human Computation Markets , 2013, AMCIS.

[23]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[24]  Chris Callison-Burch,et al.  Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.

[25]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[26]  David Alan Grier,et al.  When computers were human , 2005 .

[27]  Lukas Biewald,et al.  Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing , 2011, Human Computation.

[28]  Andrew McGregor,et al.  AutoMan: a platform for integrating human-based and digital computation , 2012, OOPSLA '12.

[29]  Matthew Lease,et al.  Beyond AMT: An Analysis of Crowd Work Platforms , 2013, ArXiv.

[30]  Stefan Dietze,et al.  A taxonomy of microtasks on the web , 2014, HT.

[31]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[32]  Nikolay Archak,et al.  Money, glory and cheap talk: analyzing strategic behavior of contestants in simultaneous crowdsourcing contests on TopCoder.com , 2010, WWW '10.

[33]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[34]  Lesandro Ponciano,et al.  Considering human aspects on strategies for designing and managing distributed human computation , 2014, Journal of Internet Services and Applications.

[35]  Gianluca Demartini,et al.  Mechanical Cheat: Spamming Schemes and Adversarial Techniques on Crowdsourcing Platforms , 2012, CrowdSearch.

[36]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[37]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .