Adaptive Task Assignment for Crowdsourced Classification

Crowdsourcing markets have gained popularity as a tool for inexpensively collecting data from diverse populations of workers. Classification tasks, in which workers provide labels (such as "offensive" or "not offensive") for instances (such as "websites"), are among the most common tasks posted, but due to human error and the prevalence of spam, the labels collected are often noisy. This problem is typically addressed by collecting labels for each instance from multiple workers and combining them in a clever way, but the question of how to choose which tasks to assign to each worker is often overlooked. We investigate the problem of task assignment and label inference for heterogeneous classification tasks. By applying online primal-dual techniques, we derive a provably near-optimal adaptive assignment algorithm. We show that adaptively assigning workers to tasks can lead to more accurate predictions at a lower cost when the available workers are diverse.

[1]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[2]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[3]  Joseph Naor,et al.  A primal-dual randomized algorithm for weighted paging , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[4]  Joseph Naor,et al.  Online Primal-Dual Algorithms for Covering and Packing Problems , 2005, ESA.

[5]  Thomas P. Hayes,et al.  The adwords problem: online keyword matching with budgeted bidders under random permutations , 2009, EC '09.

[6]  Nikhil R. Devanur,et al.  Near optimal online algorithms and fast approximation algorithms for resource allocation problems , 2011, EC '11.

[7]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[8]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[9]  Joseph Naor,et al.  Online Primal-Dual Algorithms for Maximizing Ad-Auctions Revenue , 2007, ESA.

[10]  Koby Crammer,et al.  Learning from Data of Variable Quality , 2005, NIPS.

[11]  Koby Crammer,et al.  Learning from Multiple Sources , 2006, NIPS.

[12]  Chien-Ju Ho,et al.  Online Task Assignment in Crowdsourcing Markets , 2012, AAAI.

[13]  Nicholas R. Jennings,et al.  Efficient Crowdsourcing of Unknown Experts using Multi-Armed Bandits , 2012, ECAI.

[14]  Ohad Shamir,et al.  Vox Populi: Collecting High-Quality Labels from a Crowd , 2009, COLT.

[15]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[16]  Devavrat Shah,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[17]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[18]  Turk Paul Wais,et al.  Towards Building a High-Quality Workforce with Mechanical , 2010 .

[19]  Lukas Biewald,et al.  Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing , 2011, Human Computation.

[20]  Noga Alon,et al.  A general approach to online network optimization problems , 2004, SODA '04.

[21]  R. Preston McAfee,et al.  Who moderates the moderators?: crowdsourcing abuse detection in user-generated content , 2011, EC '11.

[22]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[23]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[24]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[25]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .