Sprout: Crowd-Powered Task Design for Crowdsourcing

While crowdsourcing enables data collection at scale, ensuring high-quality data remains a challenge. In particular, effective task design underlies nearly every reported crowdsourcing success, yet remains difficult to accomplish. Task design is hard because it involves a costly iterative process: identifying the kind of work output one wants, conveying this information to workers, observing worker performance, understanding what remains ambiguous, revising the instructions, and repeating the process until the resulting output is satisfactory. To facilitate this process, we propose a novel meta-workflow that helps requesters optimize crowdsourcing task designs and Sprout, our open-source tool, which implements this workflow. Sprout improves task designs by (1) eliciting points of confusion from crowd workers, (2) enabling requesters to quickly understand these misconceptions and the overall space of questions, and (3) guiding requesters to improve the task design in response. We report the results of a user study with two labeling tasks demonstrating that requesters strongly prefer Sprout and produce higher-rated instructions compared to current best practices for creating gated instructions (instructions plus a workflow for training and testing workers). We also offer a set of design recommendations for future tools that support crowdsourcing task design.

[1]  Lydia B. Chilton,et al.  Cascade: crowdsourcing taxonomy creation , 2013, CHI.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Michael S. Bernstein,et al.  The Daemo Crowdsourcing Marketplace , 2017, CSCW Companion.

[4]  Jennifer Widom,et al.  Understanding Workers, Developing Effective Tasks, and Enhancing Marketplace Dynamics: A Study of a Large Crowdsourcing Marketplace , 2017, Proc. VLDB Endow..

[5]  Todd Kulesza,et al.  Structured labeling for facilitating concept evolution in machine learning , 2014, CHI.

[6]  Daniel S. Weld,et al.  Artificial Intelligence and Collective Intelligence , 2014 .

[7]  Omar Alonso,et al.  Using crowdsourcing for TREC relevance assessment , 2012, Inf. Process. Manag..

[8]  Tamás D. Gedeon,et al.  Pagination versus Scrolling in Mobile Web Search , 2016, CIKM.

[9]  Dan Cosley,et al.  Taking a HIT: Designing around Rejection, Mistrust, Risk, and Workers' Experiences in Amazon Mechanical Turk , 2016, CHI.

[10]  Bipin Indurkhya,et al.  Cognitively inspired task design to improve user performance on crowdsourcing platforms , 2014, CHI.

[11]  Björn Hartmann,et al.  Fantasktic: Improving Quality of Results for Novice Crowdsourcing Users , 2012 .

[12]  Lukas Biewald,et al.  Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing , 2011, Human Computation.

[13]  Guan Wang,et al.  Crowdsourcing from Scratch: A Pragmatic Experiment in Data Collection by Novice Requesters , 2015, HCOMP.

[14]  Alexander J. Quinn,et al.  Confusing the Crowd: Task Instruction Quality on Amazon Mechanical Turk , 2017, HCOMP.

[15]  Aniket Kittur,et al.  CrowdScape: interactively visualizing user behavior and output , 2012, UIST.

[16]  Angli Liu,et al.  Effective Crowd Annotation for Relation Extraction , 2016, NAACL.

[17]  Aniket Kittur,et al.  CrowdWeaver: visually managing complex crowd work , 2012, CSCW.

[18]  Ece Kamar,et al.  Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets , 2017, CHI.

[19]  V. K. Chaithanya Manam,et al.  WingIt: Efficient Refinement of Unclear Task Instructions , 2018, HCOMP.

[20]  Michael S. Bernstein,et al.  Measuring Crowdsourcing Effort with Error-Time Curves , 2015, CHI.

[21]  Björn Hartmann,et al.  Collaboratively crowdsourcing workflows with turkomatic , 2012, CSCW.

[22]  Mausam,et al.  Optimal Testing for Crowd Workers , 2016, AAMAS.

[23]  Purnamrita Sarkar,et al.  Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning , 2014, Proc. VLDB Endow..

[24]  Daniel M. Oppenheimer,et al.  Instructional Manipulation Checks: Detecting Satisficing to Increase Statistical Power , 2009 .

[25]  Eric Gilbert,et al.  Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk , 2015, CHI.

[26]  Leif Azzopardi,et al.  How many results per page?: A Study of SERP Size, Search Behavior and User Experience , 2015, SIGIR.

[27]  David J. Hauser,et al.  Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants , 2015, Behavior Research Methods.

[28]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[29]  Scott R. Klemmer,et al.  Shepherding the crowd yields better work , 2012, CSCW.

[30]  Alessandro Bozzon,et al.  Clarity is a Worthwhile Quality: On the Role of Task Clarity in Microtask Crowdsourcing , 2017, HT.

[31]  Jeffrey Heer,et al.  Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks , 2016, CSCW.

[32]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[33]  A. Strauss,et al.  Basics of Qualitative Research , 1992 .