Optimality of Belief Propagation for Crowdsourced Classification: Proof for Arbitrary Number of Per-worker Assignments

Crowdsourcing systems are popular for solving large-scale labelling tasks with low-paid (or even non-paid) workers. We study the problem of recovering the true labels from the possibly erroneous crowdsourced labels under the popular Dawid-Skene model. To address this inference problem, several algorithms have recently been proposed, but the best known guarantee is still significantly larger than the fundamental limit. In our previous work, we closed this gap under a canonical assumption where each worker is assigned only two tasks, i.e., r = 2, and each task is assigned to sufficiently but constantly many workers, ` ≥ Cr. In this work, we further remove the condition on r and show that for all r ≥ 1, Belief Propagation exactly matches a lower bound on the fundamental limit if ` ≥ Cr. The guaranteed optimality of BP is the strongest in the sense that it is information-theoretically impossible for any other algorithm to correctly label a larger fraction of the tasks. In the general setting, regardless of the number of workers assigned to a task, we establish the dominance result on BP that it outperforms all existing algorithms with provable guarantees. Experimental results suggest that BP is close to optimal for all regimes considered, while all other algorithms show suboptimal performances in certain regimes.

[1]  Laurent Massoulié,et al.  Non-backtracking Spectrum of Random Graphs: Community Detection and Non-regular Ramanujan Graphs , 2014, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[2]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[3]  Hongwei Li,et al.  Error Rate Analysis of Labeling by Crowdsourcing , 2013 .

[4]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[5]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[6]  Chao Gao,et al.  Minimax Optimal Convergence Rates for Estimating Ground Truth from Crowdsourced Labels , 2013, 1310.5764.

[7]  Muhammad Fikry,et al.  Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) Menggunakan Pendekatan Drag and Drop , 2015 .

[8]  Bruce E. Hajek,et al.  Exact recovery threshold in the binary censored block model , 2015, 2015 IEEE Information Theory Workshop - Fall (ITW).

[9]  Devavrat Shah,et al.  Efficient crowdsourcing for multi-class labeling , 2013, SIGMETRICS '13.

[10]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[11]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[12]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[13]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[14]  Rüdiger L. Urbanke,et al.  Spatially coupled ensembles universally achieve capacity under belief propagation , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[15]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[16]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[17]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[18]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[19]  Devavrat Shah,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[20]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[21]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[22]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[23]  Hongwei Li,et al.  Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing , 2014, ArXiv.

[24]  R. Preston McAfee,et al.  Who moderates the moderators?: crowdsourcing abuse detection in user-generated content , 2011, EC '11.

[25]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[26]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[27]  Nihar B. Shah,et al.  Regularized Minimax Conditional Entropy for Crowdsourcing , 2015, ArXiv.

[28]  Jinwoo Shin,et al.  Max-Product Belief Propagation for Linear Programming: Applications to Combinatorial Optimization , 2015, UAI.

[29]  Anirban Dasgupta,et al.  Aggregating crowdsourced binary ratings , 2013, WWW.