Iterative Learning for Reliable Crowdsourcing Systems

Crowdsourcing systems, in which tasks are electronically distributed to numerous "information piece-workers", have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all crowdsourcers must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in some way such as majority voting. In this paper, we consider a general model of such crowdsourcing tasks, and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give a new algorithm for deciding which tasks to assign to which workers and for inferring correct answers from the workers' answers. We show that our algorithm significantly outperforms majority voting and, in fact, is asymptotically optimal through comparison to an oracle that knows the reliability of every worker.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[5]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[6]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[7]  S. Kak Information, physics, and computation , 1996 .

[8]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[9]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[10]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[11]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[12]  Rüdiger L. Urbanke,et al.  Modern Coding Theory , 2008 .

[13]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[14]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[15]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..