All Those Wasted Hours: On Task Abandonment in Crowdsourcing

Crowdsourcing has become a standard methodology to collect manually annotated data such as relevance judgments at scale. On crowdsourcing platforms like Amazon MTurk or FigureEight, crowd workers select tasks to work on based on different dimensions such as task reward and requester reputation. Requesters then receive the judgments of workers who self-selected into the tasks and completed them successfully. Several crowd workers, however, preview tasks, begin working on them, reaching varying stages of task completion without finally submitting their work. Such behavior results in unrewarded effort which remains invisible to requesters. In this paper, we conduct the first investigation into the phenomenon of task abandonment, the act of workers previewing or beginning a task and deciding not to complete it. We follow a three-fold methodology which includes 1) investigating the prevalence and causes of task abandonment by means of a survey over different crowdsourcing platforms, 2) data-driven analyses of logs collected during a large-scale relevance judgment experiment, and 3) controlled experiments measuring the effect of different dimensions on abandonment. Our results show that task abandonment is a widely spread phenomenon. Apart from accounting for a considerable amount of wasted human effort, this bears important implications on the hourly wages of workers as they are not rewarded for tasks that they do not complete. We also show how task abandonment may have strong implications on the use of collected data (for example, on the evaluation of IR systems).

[1]  Anselm L. Strauss,et al.  Qualitative Analysis For Social Scientists , 1987 .

[2]  Carsten Eickhoff,et al.  Cognitive Biases in Crowdsourcing , 2018, WSDM.

[3]  B. Berg Qualitative Research Methods for the Social Sciences , 1989 .

[4]  Matthew Lease,et al.  Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments , 2016, HCOMP.

[5]  Eero Sormunen,et al.  Liberal relevance criteria of TREC -: counting on negligible documents? , 2002, SIGIR '02.

[6]  Richard Gruner,et al.  What’s in a crowd? Exploring crowdsourced versus traditional customer participation in the innovation process , 2017 .

[7]  Aniket Kittur,et al.  Instrumenting the crowd: using implicit behavioral measures to predict task performance , 2011, UIST.

[8]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[9]  Stefan Dietze,et al.  Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys , 2015, CHI.

[10]  Chris Callison-Burch,et al.  A Data-Driven Analysis of Workers' Earnings on Amazon Mechanical Turk , 2017, CHI.

[11]  Scott R. Klemmer,et al.  Shepherding the crowd: managing and providing feedback to crowd workers , 2011, CHI Extended Abstracts.

[12]  Michael S. Bernstein,et al.  Direct answers for search queries in the long tail , 2012, CHI.

[13]  Milad Shokouhi,et al.  Deep Sequential Models for Task Satisfaction Prediction , 2017, CIKM.

[14]  Gianluca Demartini,et al.  An Introduction to Hybrid Human-Machine Information Systems , 2017, Found. Trends Web Sci..

[15]  Ryen W. White,et al.  Mining the search trails of surfing crowds: identifying relevant websites from user activity , 2008, WWW.

[16]  Neha Gupta,et al.  Modus Operandi of Crowd Workers , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[17]  Michael S. Bernstein,et al.  The future of crowd work , 2013, CSCW.

[18]  Ryen W. White,et al.  Leaving so soon?: understanding and predicting web search abandonment rationales , 2012, CIKM.

[19]  Gianluca Demartini,et al.  Pick-a-crowd: tell me what you like, and i'll tell you what to do , 2013, CIDR.

[20]  Lakshminarayanan Subramanian,et al.  Identifying Unreliable and Adversarial Workers in Crowdsourced Labeling Tasks , 2017, J. Mach. Learn. Res..

[21]  Bill Tomlinson,et al.  Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[22]  Ari Kobren,et al.  Getting More for Less: Optimized Crowdsourcing with Dynamic Tasks and Goals , 2015, WWW.

[23]  Emine Yilmaz,et al.  Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems , 2012, Information Retrieval.

[24]  Bo Zhao,et al.  The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing , 2014, WWW.

[25]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[26]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[27]  Gabriella Kazai,et al.  Quality Management in Crowdsourcing using Gold Judges Behavior , 2016, WSDM.

[28]  Stefan Dietze,et al.  A taxonomy of microtasks on the web , 2014, HT.

[29]  Alessandro Bozzon,et al.  Choosing the right crowd: expert finding in social networks , 2013, EDBT '13.

[30]  Gabriella Kazai,et al.  The face of quality in crowdsourcing relevance labels: demographics, personality and labeling accuracy , 2012, CIKM.

[31]  Matthew Lease,et al.  Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to Ensure Quality Relevance Annotations , 2018, HCOMP.

[32]  Sharon L. Wolchik 1989 , 2009 .

[33]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[34]  Falk Scholer,et al.  On Crowdsourcing Relevance Magnitudes for Information Retrieval Evaluation , 2017, ACM Trans. Inf. Syst..

[35]  Ryen W. White,et al.  Modeling dwell time to predict click-level satisfaction , 2014, WSDM.

[36]  Eddy Maddalena,et al.  Considering Assessor Agreement in IR Evaluation , 2017, ICTIR.

[37]  Eddy Maddalena,et al.  On Fine-Grained Relevance Scales , 2018, SIGIR.

[38]  Mounia Lalmas,et al.  Understanding User Attention and Engagement in Online News Reading , 2016, WSDM.

[39]  Donna K. Harman,et al.  Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.

[40]  Daren C. Brabham Crowdsourcing as a Model for Problem Solving , 2008 .

[41]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[42]  Marco Basaldella,et al.  Crowdsourcing Relevance Assessments: The Unexpected Benefits of Limiting the Time to Judge , 2016, HCOMP.

[43]  Fernando González-Ladrón-de-Guevara,et al.  Towards an integrated crowdsourcing definition , 2012, J. Inf. Sci..

[44]  Nick Craswell,et al.  Beyond clicks: query reformulation as a predictor of search satisfaction , 2013, CIKM.