Crowdsourcing Control: Moving Beyond Multiple Choice

To ensure quality results from crowdsourced tasks, requesters often aggregate worker responses and use one of a plethora of strategies to infer the correct answer from the set of noisy responses. However, all current models assume prior knowledge of all possible outcomes of the task. While not an unreasonable assumption for tasks that can be posited as multiple-choice questions (e.g. n-ary classification), we observe that many tasks do not naturally fit this paradigm, but instead demand a free-response formulation where the outcome space is of infinite size (e.g. audio transcription). We model such tasks with a novel probabilistic graphical model, and design and implement LAZYSUSAN, a decision-theoretic controller that dynamically requests responses as necessary in order to infer answers to these tasks. We also design an EM algorithm to jointly learn the parameters of our model while inferring the correct answers to multiple tasks at a time. Live experiments on Amazon Mechanical Turk demonstrate the superiority of LAZYSUSAN at solving SAT Math questions, eliminating 83.2% of the error and achieving greater net utility compared to the state-of-the-art strategy, majority-voting. We also show in live experiments that our EM algorithm outperforms majority-voting on a visualization task that we design.

[1]  École d'été de probabilités de Saint-Flour,et al.  École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[2]  D. Aldous Exchangeability and related topics , 1985 .

[3]  W. Batchelder,et al.  Culture as Consensus: A Theory of Culture and Informant Accuracy , 1986 .

[4]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[5]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[6]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[7]  Lydia B. Chilton,et al.  TurKit: Tools for iterative tasks on mechanical turk , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[8]  Peng Dai,et al.  Decision-Theoretic Control of Crowd-Sourced Workflows , 2010, AAAI.

[9]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[10]  Dafna Shahaf,et al.  Generalized Task Markets for Human and Machine Computation , 2010, AAAI.

[11]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[12]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[13]  Peng Dai,et al.  Artificial Intelligence for Artificial Artificial Intelligence , 2011, AAAI.

[14]  Devavrat Shah,et al.  Budget-optimal crowdsourcing using low-rank matrix approximations , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  David Alan Grier,et al.  Error Identification and Correction in Human Computation: Lessons from the WPA , 2011, Human Computation.

[16]  Krzysztof Z. Gajos,et al.  Platemate: crowdsourcing nutritional analysis from food photographs , 2011, UIST.

[17]  Peng Dai,et al.  Human Intelligence Needs Artificial Intelligence , 2011, Human Computation.

[18]  Mausam,et al.  Dynamically Switching between Synergistic Workflows for Crowdsourcing , 2012, AAAI.

[19]  Eric Horvitz,et al.  Combining human and machine intelligence in large-scale crowdsourcing , 2012, AAMAS.

[20]  Jennifer Widom,et al.  CrowdScreen: algorithms for filtering data with humans , 2012, SIGMOD Conference.

[21]  Gheorghe Tecuci,et al.  Artificial intelligence , 2012 .