A robust inference algorithm for crowd sourced categorization

With the rapid growing of crowdsourcing systems, class labels for supervised learning can be easily obtained from crowdsourcing platforms. To deal with the problem that labels obtained from crowds are usually noisy due to imperfect reliability of non-expert workers, we let multiple workers provide labels for the same object. Then, true labels of the labeled object are estimated through ground truth inference algorithms. The inferred integrated labels are expected to be of high quality. In this paper, we propose a novel ground truth inference algorithm based on EM algorithm, which not only infers the true labels of the instances but also simultaneously estimates the reliability of each worker and the difficulty of each instance. Experimental results on seven real-world crowdsourcing datasets show that our proposed algorithm outperforms eight state-of-the art algorithms.

[1]  Xindong Wu,et al.  CEKA: a tool for mining the wisdom of crowds , 2015, J. Mach. Learn. Res..

[2]  Xindong Wu,et al.  Learning from crowdsourced labeled data: a survey , 2016, Artificial Intelligence Review.

[3]  Matthew Lease,et al.  Improving Consensus Accuracy via Z-Score and Weighted Voting , 2011, Human Computation.

[4]  Xindong Wu,et al.  Multi-Class Ground Truth Inference in Crowdsourcing with Clustering , 2016, IEEE Transactions on Knowledge and Data Engineering.

[5]  Panagiotis G. Ipeirotis,et al.  Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.

[6]  Devavrat Shah,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[7]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[8]  Boi Faltings,et al.  Incentives to Counter Bias in Human Computation , 2014, HCOMP.

[9]  Xindong Wu,et al.  Imbalanced Multiple Noisy Labeling , 2015, IEEE Transactions on Knowledge and Data Engineering.

[10]  Victor S. Sheng,et al.  Consensus algorithms for biased labeling in crowdsourcing , 2017, Inf. Sci..

[11]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[12]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[13]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[14]  Carlo Strapparava,et al.  SemEval-2007 Task 14: Affective Text , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[15]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[16]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[17]  Qiang Liu,et al.  Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy , 2014, ICML.

[18]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[19]  Matthew Lease,et al.  SQUARE: A Benchmark for Research on Computing Crowd Consensus , 2013, HCOMP.

[20]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[21]  Xindong Wu,et al.  Improving Label Quality in Crowdsourcing Using Noise Correction , 2015, CIKM.

[22]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[23]  Devavrat Shah,et al.  Budget-optimal crowdsourcing using low-rank matrix approximations , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[24]  Jacques-Louis Lions,et al.  Handbook of numerical analysis (volume VIII) , 2002 .

[25]  R. Preston McAfee,et al.  Who moderates the moderators?: crowdsourcing abuse detection in user-generated content , 2011, EC '11.