Shepherding the crowd yields better work

Micro-task platforms provide massively parallel, on-demand labor. However, it can be difficult to reliably achieve high-quality work because online workers may behave irresponsibly, misunderstand the task, or lack necessary skills. This paper investigates whether timely, task-specific feedback helps crowd workers learn, persevere, and produce better results. We investigate this question through Shepherd, a feedback system for crowdsourced work. In a between-subjects study with three conditions, crowd workers wrote consumer reviews for six products they own. Participants in the None condition received no immediate feedback, consistent with most current crowdsourcing practices. Participants in the Self-assessment condition judged their own work. Participants in the External assessment condition received expert feedback. Self-assessment alone yielded better overall work than the None condition and helped workers improve over time. External assessment also yielded these benefits. Participants who received external assessment also revised their work more. We conclude by discussing interaction and infrastructure approaches for integrating real-time assessment into online work.

[1]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[2]  David Boud,et al.  Enhancing learning through self assessment , 1995 .

[3]  Pamela J. Hinds,et al.  The curse of expertise: The effects of expertise and debiasing methods on prediction of novice performance. , 1999 .

[4]  Jeffrey P. Bigham,et al.  VizWiz: nearly real-time answers to visual questions , 2010, W4A.

[5]  Coye Cheshire,et al.  The Social Psychological Effects of Feedback on the Production of Internet Information Pools , 2008, J. Comput. Mediat. Commun..

[6]  S. Hanrahan,et al.  Assessing Self- and Peer-assessment: The students' views , 2001 .

[7]  Devavrat Shah,et al.  Budget-optimal crowdsourcing using low-rank matrix approximations , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[8]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[9]  Paul Resnick,et al.  Slash(dot) and burn: distributed moderation in a large online conversation space , 2004, CHI.

[10]  John Annett,et al.  Feedback and Human Behaviour: The Effects of Knowledge of Results, Incentives and Reinforcement on Learning and Performance , 1969 .

[11]  M. Taras,et al.  Using Assessment for Learning and Learning from Assessment , 2002 .

[12]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[13]  Nikos Mattheos,et al.  The interactive examination: assessing students' self‐assessment ability , 2004, Medical education.

[14]  Stephen Merry,et al.  A Study in Self‐assessment: tutor and students’ perceptions of performance criteria , 1997 .

[15]  Etienne Wenger,et al.  Situated Learning: Legitimate Peripheral Participation , 1991 .

[16]  Davis Rn The Checklist Manifesto. How to get things right , 2010 .

[17]  Eytan Adar,et al.  The impact of social information on visual judgments , 2011, CHI.

[18]  S. Wood-Dauphinée,et al.  There's no place like home : an evaluation of early supported discharge for stroke. , 2000, Stroke.

[19]  M. Taras,et al.  To Feedback or Not to Feedback in Student Self-assessment , 2003 .

[20]  Bill Tomlinson,et al.  Sellers' problems in human computation markets , 2010, HCOMP '10.

[21]  D. Boud Sustainable Assessment: Rethinking assessment for the learning society , 2000 .

[22]  Lydia B. Chilton,et al.  Exploring iterative and parallel human computation processes , 2010, HCOMP '10.

[23]  B. Zimmerman Becoming a Self-Regulated Learner: Which Are the Key Subprocesses?. , 1986 .

[24]  John Joseph Horton,et al.  Employer Expectations, Peer Effects and Productivity: Evidence from a Series of Field Experiments , 2010, ArXiv.

[25]  V. Shute Focus on Formative Feedback , 2008 .

[26]  Dana Chandler,et al.  Breaking Monotony with Meaning: Motivation in Crowdsourcing Markets , 2012, ArXiv.

[27]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[28]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[29]  D. Sadler Formative assessment and the design of instructional systems , 1989 .

[30]  Paul J. Feltovich,et al.  The Cambridge handbook of expertise and expert performance , 2006 .

[31]  Ann S. Masten,et al.  A revised class play method of peer assessment. , 1985 .

[32]  Martin Wattenberg,et al.  The Hidden Order of Wikipedia , 2007, HCI.

[33]  Jeffrey Heer,et al.  Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.