Multi-Label Truth Inference for Crowdsourcing Using Mixture Models

When acquiring labels from crowdsourcing platforms, a task may be designed to include multiple labels and the values of each label may belong to a set of various distinct options, which is the so-called multi-class multi-label annotation. To improve the quality of labels, requesters usually let one task be independently completed by a group of heterogeneous crowdsourced workers. Then, the true values of the multiple labels of each task are inferred from these repeated noisy labels. In this paper, we propose two novel probabilistic models MCMLI and MCMLD to address the multi-class multi-label inference problem in crowdsourcing. MCMLI assumes that the labels of each task are mutually independent and MCMLD utilizes a mixture of multiple independently multinoulli distributions to capture the correlation among the labels. Both models can jointly infer multiple true labels of each instance as well as estimate the reliability of crowdsourced workers modeled by a set of confusion matrices with an expectation--maximization algorithm. Experiments with three typical crowdsourcing scenarios and a real-world dataset show that our proposed models significantly outperform existing competitive alternatives. When the labels are strongly correlated, MCMLD substantially outperforms MCMLI. Furthermore, our models can be easily simplified to the one-coin models, which show more advantageous when errors are uniformly distributed, or labels are sparse.

[1]  Hyun-Chul Kim,et al.  Bayesian Classifier Combination , 2012, AISTATS.

[2]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[3]  Hailong Sun,et al.  Improving the Quality of Crowdsourced Image Labeling via Label Similarity , 2017, Journal of Computer Science and Technology.

[4]  Zhuowen Tu,et al.  Learning to Predict from Crowdsourced Data , 2014, UAI.

[5]  Matthias Weidlich,et al.  Computing Crowd Consensus with Partial Agreement , 2018, IEEE Transactions on Knowledge and Data Engineering.

[6]  Guoliang Li,et al.  Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[7]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[8]  Stefan Kramer,et al.  Multi-label Classification Using Stacked Hierarchical Dirichlet Processes with Reduced Sampling Complexity , 2017, 2017 IEEE International Conference on Big Knowledge (ICBK).

[9]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[10]  Eric Horvitz,et al.  Identifying and Accounting for Task-Dependent Bias in Crowdsourcing , 2015, HCOMP.

[11]  Pabitra Mitra,et al.  The big data system, components, tools, and technologies: a survey , 2018, Knowledge and Information Systems.

[12]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[13]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[14]  Hailong Sun,et al.  Context-aware result inference in crowdsourcing , 2018, Inf. Sci..

[15]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[16]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[17]  Hailong Sun,et al.  Incorporating External Knowledge into Crowd Intelligence for More Specific Knowledge Acquisition , 2016, IJCAI.

[18]  Carsten Eickhoff,et al.  Cognitive Biases in Crowdsourcing , 2018, WSDM.

[19]  Sebastián Ventura,et al.  A Tutorial on Multilabel Learning , 2015, ACM Comput. Surv..

[20]  Carlo Strapparava,et al.  SemEval-2007 Task 14: Affective Text , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[21]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[22]  Matthew Lease,et al.  Crowdsourcing for information retrieval , 2012, SIGF.

[23]  Li Fei-Fei,et al.  Crowdsourcing in Computer Vision , 2016, Found. Trends Comput. Graph. Vis..

[24]  Xiting Wang,et al.  Improving Learning-from-Crowds through Expert Validation , 2017, IJCAI.

[25]  Mausam,et al.  Crowdsourcing Multi-Label Classification for Taxonomy Creation , 2013, HCOMP.

[26]  Hamid R. Rabiee,et al.  A unified statistical framework for crowd labeling , 2015, Knowledge and Information Systems.

[27]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[28]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[29]  Aleksandrs Slivkins,et al.  Incentivizing high quality crowdwork , 2015, SECO.

[30]  Yee Whye Teh,et al.  Bayesian nonparametric crowdsourcing , 2014, J. Mach. Learn. Res..

[31]  Xindong Wu,et al.  Improving Crowdsourced Label Quality Using Noise Correction , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Matthew Lease,et al.  SQUARE: A Benchmark for Research on Computing Crowd Consensus , 2013, HCOMP.

[33]  Tao Li,et al.  Label Aggregation for Crowdsourcing with Bi-Layer Clustering , 2017, SIGIR.

[34]  George Kesidis,et al.  Multicategory Crowdsourcing Accounting for Variable Task Difficulty, Worker Skill, and Worker Intention , 2015, IEEE Transactions on Knowledge and Data Engineering.

[35]  Shipeng Yu,et al.  Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[36]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[37]  Hongzhi Wang,et al.  Brief survey of crowdsourcing for data mining , 2014, Expert Syst. Appl..

[38]  Xindong Wu,et al.  Learning from crowdsourced labeled data: a survey , 2016, Artificial Intelligence Review.

[39]  Wei Hu,et al.  A new truth discovery method for resolving object conflicts over Linked Data with scale-free property , 2018, Knowledge and Information Systems.

[40]  Xindong Wu,et al.  Multi-Class Ground Truth Inference in Crowdsourcing with Clustering , 2016, IEEE Transactions on Knowledge and Data Engineering.

[41]  Min-Yen Kan,et al.  Perspectives on crowdsourcing annotations for natural language processing , 2012, Language Resources and Evaluation.

[42]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[43]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[44]  Michael S. Bernstein,et al.  Scalable multi-label annotation , 2014, CHI.

[45]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[46]  Shao-Yuan Li,et al.  Multi-Label Active Learning from Crowds , 2015, ArXiv.

[47]  Stefanie Nowak,et al.  How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.

[48]  Xindong Wu,et al.  Multi-Label Inference for Crowdsourcing , 2018, KDD.

[49]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.