Crowd Anatomy Beyond the Good and Bad: Behavioral Traces for Crowd Worker Modeling and Pre-selection

The suitability of crowdsourcing to solve a variety of problems has been investigated widely. Yet, there is still a lack of understanding about the distinct behavior and performance of workers within microtasks. In this paper, we first introduce a fine-grained data-driven worker typology based on different dimensions and derived from behavioral traces of workers. Next, we propose and evaluate novel models of crowd worker behavior and show the benefits of behavior-based worker pre-selection using machine learning models. We also study the effect of task complexity on worker behavior. Finally, we evaluate our novel typology-based worker pre-selection method in image transcription and information finding tasks involving crowd workers completing 1,800 HITs. Our proposed method for worker pre-selection leads to a higher quality of results when compared to the standard practice of using qualification or pre-screening tests. For image transcription tasks our method resulted in an accuracy increase of nearly 7% over the baseline and of almost 10% in information finding tasks, without a significant difference in task completion time. Our findings have important implications for crowdsourcing systems where a worker’s behavioral type is unknown prior to participation in a task. We highlight the potential of leveraging worker types to identify and aid those workers who require further training to improve their performance. Having proposed a powerful automated mechanism to detect worker types, we reflect on promoting fairness, trust and transparency in microtask crowdsourcing platforms.

[1]  Aniket Kittur,et al.  Instrumenting the crowd: using implicit behavioral measures to predict task performance , 2011, UIST.

[2]  N. Timasheff,et al.  On Methods in the Social Sciences , 1945 .

[3]  Panagiotis G. Ipeirotis,et al.  The Dynamics of Micro-Task Crowdsourcing: The Case of Amazon MTurk , 2015, WWW.

[4]  Matthew Lease,et al.  SQUARE: A Benchmark for Research on Computing Crowd Consensus , 2013, HCOMP.

[5]  Elena Paslaru Bontas Simperl,et al.  Towards Hybrid NER: A Study of Content and Crowdsourcing-Related Performance Factors , 2015, ESWC.

[6]  Gabriella Kazai,et al.  Worker types and personality traits in crowdsourcing relevance labels , 2011, CIKM '11.

[7]  Anselm L. Strauss,et al.  Qualitative Analysis For Social Scientists , 1987 .

[8]  R. Wood Task complexity: Definition of the construct , 1986 .

[9]  A. Strauss,et al.  The Discovery of Grounded Theory , 1967 .

[10]  Neha Gupta,et al.  Modus Operandi of Crowd Workers , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[11]  B. Berg Qualitative Research Methods for the Social Sciences , 1989 .

[12]  Stefan Dietze,et al.  Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys , 2015, CHI.

[13]  Jing Wang jwang Managing Crowdsourcing Workers , 2011 .

[14]  M. Taras,et al.  Using Assessment for Learning and Learning from Assessment , 2002 .

[15]  Lukas Biewald,et al.  Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing , 2011, Human Computation.

[16]  Jacki O'Neill,et al.  Being a turker , 2014, CSCW.

[17]  Gianluca Demartini,et al.  Pick-a-crowd: tell me what you like, and i'll tell you what to do , 2013, CIDR.

[18]  Stefan Dietze,et al.  Using Worker Self-Assessments for Competence-Based Pre-Selection in Crowdsourcing Microtasks , 2017, ACM Trans. Comput. Hum. Interact..

[19]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[20]  Gabriella Kazai,et al.  Quality Management in Crowdsourcing using Gold Judges Behavior , 2016, WSDM.

[21]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[22]  Judith Redi,et al.  Modeling Task Complexity in Crowdsourcing , 2016, HCOMP.

[23]  Elena Paslaru Bontas Simperl,et al.  Improving Paid Microtasks through Gamification and Adaptive Furtherance Incentives , 2015, WWW.

[24]  Gabriella Kazai,et al.  The face of quality in crowdsourcing relevance labels: demographics, personality and labeling accuracy , 2012, CIKM.

[25]  Arjen P. de Vries,et al.  Obtaining High-Quality Relevance Judgments Using Crowdsourcing , 2012, IEEE Internet Computing.

[26]  Stefan Dietze,et al.  A taxonomy of microtasks on the web , 2014, HT.

[27]  Stefan Siersdorfer,et al.  Groupsourcing: Team Competition Designs for Crowdsourcing , 2015, WWW.

[28]  Alessandro Bozzon,et al.  Choosing the right crowd: expert finding in social networks , 2013, EDBT '13.

[29]  Michael S. Bernstein,et al.  Boomerang: Rebounding the Consequences of Reputation Feedback on Crowdsourcing Platforms , 2016, UIST.

[30]  Scott R. Klemmer,et al.  Shepherding the crowd yields better work , 2012, CSCW.

[31]  Alessandro Bozzon,et al.  Clarity is a Worthwhile Quality: On the Role of Task Clarity in Microtask Crowdsourcing , 2017, HT.

[32]  Gabriella Kazai,et al.  An analysis of human factors and label accuracy in crowdsourcing relevance judgments , 2013, Information Retrieval.

[33]  Michael S. Bernstein,et al.  Break It Down: A Comparison of Macro- and Microtasks , 2015, CHI.

[34]  Ricardo Kawase,et al.  Training Workers for Improving Performance in Crowdsourcing Microtasks , 2015, EC-TEL.

[35]  Aniket Kittur,et al.  CrowdScape: interactively visualizing user behavior and output , 2012, UIST.

[36]  Peter Eckersley,et al.  How Unique Is Your Web Browser? , 2010, Privacy Enhancing Technologies.

[37]  Matthew Lease,et al.  MmmTurkey: A Crowdsourcing Framework for Deploying Tasks and Recording Worker Behavior on Amazon Mechanical Turk , 2016, ArXiv.

[38]  Ricardo Kawase,et al.  Improving Reliability of Crowdsourced Results by Detecting Crowd Workers with Multiple Identities , 2017, ICWE.

[39]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[40]  Padmini Srinivasan,et al.  Quality through flow and immersion: gamifying crowdsourced relevance assessments , 2012, SIGIR '12.

[41]  Michael S. Bernstein,et al.  The future of crowd work , 2013, CSCW.

[42]  Stefan Dietze,et al.  Improving learning through achievement priming in crowdsourced information finding microtasks , 2017, LAK.

[43]  Frank M. Shipman,et al.  Experiences surveying the crowd: reflections on methods, participation, and reliability , 2013, WebSci.

[44]  M. Six Silberman,et al.  Turkopticon: interrupting worker invisibility in amazon mechanical turk , 2013, CHI.