Quality through flow and immersion: gamifying crowdsourced relevance assessments

Crowdsourcing is a market of steadily-growing importance upon which both academia and industry increasingly rely. However, this market appears to be inherently infested with a significant share of malicious workers who try to maximise their profits through cheating or sloppiness. This serves to undermine the very merits crowdsourcing has come to represent. Based on previous experience as well as psychological insights, we propose the use of a game in order to attract and retain a larger share of reliable workers to frequently-requested crowdsourcing tasks such as relevance assessments and clustering. In a large-scale comparative study conducted using recent TREC data, we investigate the performance of traditional HIT designs and a game-based alternative that is able to achieve high quality at significantly lower pay rates, facing fewer malicious submissions.

[1]  Laura A. Dabbish,et al.  Designing games with a purpose , 2008, CACM.

[2]  Panos Ipeirotis Crowdsourcing using Mechanical Turk: quality management and scalability , 2011, IIWeb '11.

[3]  Gabriella Kazai,et al.  Overview of the TREC 2012 Crowdsourcing Track , 2012, TREC.

[4]  Christopher G. Harris You're Hired! An Examination of Crowdsourcing Incentive Models in Human Resource Tasks , 2011 .

[5]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[6]  A. P. deVries,et al.  How Crowdsourcable is Your Task , 2011 .

[7]  Joseph Lampel,et al.  The Role of Status-Seeking in Online Communities: Giving the Gift of Experience , 2007, J. Comput. Mediat. Commun..

[8]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[9]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[10]  Omar Alonso,et al.  Crowdsourcing for relevance evaluation , 2008, SIGF.

[11]  Juan Llorens Morillo,et al.  The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track , 2011, TREC.

[12]  Gabriella Kazai,et al.  Worker types and personality traits in crowdsourcing relevance labels , 2011, CIKM '11.

[13]  J. McGonigal Reality Is Broken: Why Games Make Us Better and How They Can Change the World , 2011 .

[14]  Gabriella Kazai,et al.  In Search of Quality in Crowdsourcing for Search Engine Evaluation , 2011, ECIR.

[15]  Arjen P. de Vries,et al.  Increasing cheat robustness of crowdsourcing tasks , 2013, Information Retrieval.

[16]  Gobinda G. Chowdhury,et al.  TREC: Experiment and Evaluation in Information Retrieval , 2007 .

[17]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[18]  Raman Chandrasekar,et al.  Improving search engines using human computation games , 2009, CIKM.

[19]  Margaret A. Boden,et al.  HOW ARTIFICIAL IS ARTIFICIAL INTELLIGENCE?* , 1973, The British Journal for the Philosophy of Science.

[20]  Matthew Lease,et al.  Crowdsourcing for information retrieval , 2012, SIGF.

[21]  Ben Carterette,et al.  Million Query Track 2007 Overview , 2008, TREC.

[22]  Manuel Blum,et al.  Improving Image Search with PHETCH , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[23]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[24]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[25]  Matthew Lease,et al.  Crowdsourcing Document Relevance Assessment with Mechanical Turk , 2010, Mturk@HLT-NAACL.

[26]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[27]  C. Ko,et al.  Gender Differences and Related Factors Affecting Online Gaming Addiction Among Taiwanese Adolescents , 2005, The Journal of nervous and mental disease.

[28]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[29]  Matthew Lease,et al.  Crowdsourcing 101: putting the WSDM of crowds to work for you , 2011, WSDM '11.

[30]  Matthias Hirth,et al.  Cheat-Detection Mechanisms for Crowdsourcing , 2012 .

[31]  James Allan,et al.  Minimal test collections for retrieval evaluation , 2006, SIGIR.

[32]  Manuel Blum,et al.  Verbosity: a game for collecting common-sense facts , 2006, CHI.

[33]  Mark Sanderson,et al.  Quantifying test collection quality based on the consistency of relevance judgements , 2011, SIGIR.

[34]  Matthew Lease,et al.  Crowdsourcing for search evaluation , 2011, SIGF.

[35]  Gabriella Kazai,et al.  On the Evaluation of the Quality of Relevance Assessments Collected through Crowdsourcing , 2009 .

[36]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[37]  Ricardo Baeza-Yates,et al.  Design and Implementation of Relevance Assessments Using Crowdsourcing , 2011, ECIR.

[38]  Frank M. Shipman,et al.  The ownership and reuse of visual media , 2011, JCDL '11.