论文信息 - Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization

Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization

Quality assurance in crowdsourced annotation often involves having a given example labeled multiple times by different workers, then aggregating these labels. Unfortunately, the worker-example label matrix is typically sparse and imbalanced for two reasons: 1) the average crowd worker judges few examples; and 2) few labels are typically collected per example to reduce cost. To address this missing data problem, we propose use of probabilistic matrix factorization (PMF), a standard approach in collaborative filtering. To evaluate our approach, we measure accuracy of consensus labels computed from the input sparse matrix vs. the PMF-inferred complete matrix. We consider both unsupervised and supervised settings. In the supervised case, we evaluate both weighted voting and worker selection. Experiments are performed on both a synthetic data set and a real data set: crowd relevance judgments taken from the 2010 NIST TREC Relevance Feedback Track.

Matthew Lease | Hyun Joon Jung | H. J. Jung | Matthew Lease

[1] A. P. Dawid,et al. Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[2] Ruslan Salakhutdinov,et al. Probabilistic Matrix Factorization , 2007, NIPS.

[3] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[4] Panagiotis G. Ipeirotis,et al. Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[5] Omar Alonso,et al. Crowdsourcing for relevance evaluation , 2008, SIGF.

[6] Ohad Shamir,et al. Vox Populi: Collecting High-Quality Labels from a Crowd , 2009, COLT.

[7] Javier R. Movellan,et al. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[8] C. Buckley,et al. Overview of the TREC 2010 Relevance Feedback Track ( Notebook ) , 2010 .

[9] Pietro Perona,et al. Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[10] Panagiotis G. Ipeirotis,et al. Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[11] Changshui Zhang,et al. What if the irresponsible teachers are dominating? a method of training on samples and clustering on teachers , 2010, AAAI 2010.

[12] Gerardo Hermosillo,et al. Learning From Crowds , 2010, J. Mach. Learn. Res..

[13] Jianwen Zhang,et al. What if the Irresponsible Teachers Are Dominating? , 2010, AAAI.

[14] Matthew Lease,et al. Improving Consensus Accuracy via Z-Score and Weighted Voting , 2011, Human Computation.

[15] Mark D. Smucker. Crowdsourcing with a Crowd of One and Other TREC 2011 Crowdsourcing and Web Track Experiments , 2011, TREC.