论文信息 - Fantasktic: Improving Quality of Results for Novice Crowdsourcing Users

Fantasktic: Improving Quality of Results for Novice Crowdsourcing Users

Crowdsourcing platforms such as Amazon’s Mechanical Turk and MobileWorks offer great potential for users to solve computationally difficult problems with human agents. However, the quality of crowdsourcing responses is directly tied to the task description. Creating high-quality tasks today requires significant expertise, which prevents novice users from receiving reasonable results without iterating multiple times over their description. This paper asks the following research question: How can automated task design techniques help novice users create better tasks and receive higher quality responses from the crowd? We investigate this question by introducing “Fantasktic”, a system to explore how to better support end users in creating successful crowdsourcing tasks. Fantasktic introduces three major task design techniques: 1) a guided task specification interface that provides guidelines and recommendations to end users throughout the process, 2) a preview interface that presents users their task from the perspective of an agent, and 3) an automated way to generate task tutorials for agents based on sample answers provided by end users. Our evaluation investigates the impact of each of these techniques on result quality by comparing their performance with one another and against expert task specifications taken from a business which crowdsouces these tasks on MobileWorks. We tested two common crowdsourcing tasks, digitizing business cards and contact email address search on websites, with ten users who had no prior crowdsourcing experience. We generated a total of 8800 tasks based the users instructions which we submitted to a crowdsourcing platform where they were completed by 440 unique agents. We find a significant improvement for instructions based on the guided task interface which show a reduced variation of answer formats and a more frequent agreement on answers among agents. We do not find evidence for significant improvements of instructions for the task preview and the agent tutorials. Although expert tasks still perform comparably better, we show that novice users can receive higher quality results when being supported by a guided task specification interface. Table of

Björn Hartmann | Philipp Gutheim

[1] John Le,et al. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution , 2010 .

[2] Ronen I. Brafman,et al. Designing with interactive example galleries , 2010, CHI.

[3] Martin R. Gibbs,et al. Mediating intimacy: designing technologies to support strong-tie relationships , 2005, CHI.

[4] Björn Hartmann,et al. Collaboratively crowdsourcing workflows with turkomatic , 2012, CSCW.

[5] Panagiotis G. Ipeirotis,et al. Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[6] Kwong-Sak Leung,et al. A Survey of Crowdsourcing Systems , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[7] Lydia B. Chilton,et al. TurKit: human computation algorithms on mechanical turk , 2010, UIST.

[8] Susan M. Land,et al. Theoretical Foundations of Learning Environments. , 1999 .

[9] Marcia C. Linn,et al. The case for case studies of programming problems , 1992, CACM.

[10] Michael S. Bernstein,et al. Soylent: a word processor with a crowd inside , 2010, UIST.

[11] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[12] Mark Guzdial,et al. Learner-centered design: the challenge for HCI in the 21st century , 1994, INTR.

[13] Panagiotis G. Ipeirotis,et al. Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[14] Björn Hartmann,et al. MobileWorks: Designing for Quality in a Managed Crowdsourcing Architecture , 2012, IEEE Internet Computing.

[15] Benjamin B. Bederson,et al. Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[16] Lydia B. Chilton,et al. Exploring iterative and parallel human computation processes , 2010, HCOMP '10.

[17] Rob Miller,et al. VizWiz: nearly real-time answers to visual questions , 2010, UIST.

[18] Mark Guzdial,et al. Apprenticeship-based learning environments: a principled approach to providing software-realized scaffolding through hypermedia , 1998 .

[19] Krzysztof Z. Gajos,et al. Toward automatic task design: a progress report , 2010, HCOMP '10.