论文信息 - Learning from Crowds and Experts

Learning from Crowds and Experts

Crowdsourcing services are often used to collect a large amount of labeled data for machine learning. Although they provide us an easy way to get labels at very low cost in a short period, they have serious limitations. One of them is the variable quality of the crowd-generated data. There have been many attempts to increase the reliability of crowd-generated data and the quality of classifiers obtained from such data. However, in these problem settings, relatively few researchers have tried using expert-generated data to achieve further improvements. In this paper, we extend three models that deal with the problem of learning from crowds to utilize ground truths: a latent class model, a personal classifier model, and a data-dependent error model. We evaluate the proposed methods against two baseline methods on a real data set to demonstrate the effectiveness of combining crowd-generated data and expert-generated data.

[1] A. P. Dawid,et al. Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[2] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[3] Panagiotis G. Ipeirotis,et al. Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[4] Ohad Shamir,et al. Vox Populi: Collecting High-Quality Labels from a Crowd , 2009, COLT.

[5] Gerardo Hermosillo,et al. Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[6] Javier R. Movellan,et al. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[7] Pietro Perona,et al. Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[8] Pietro Perona,et al. The Multidimensional Wisdom of Crowds , 2010, NIPS.

[9] Mark W. Schmidt,et al. Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.

[10] Gerardo Hermosillo,et al. Learning From Crowds , 2010, J. Mach. Learn. Res..

[11] Mark Dredze,et al. Annotating Named Entities in Twitter Data with Crowdsourcing , 2010, Mturk@HLT-NAACL.

[12] Oren Etzioni,et al. Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[13] Michael I. Jordan,et al. Bayesian Bias Mitigation for Crowdsourcing , 2011, NIPS.

[14] Matthew Lease,et al. Semi-Supervised Consensus Labeling for Crowdsourcing , 2011 .

[15] Hisashi Kashima,et al. A Convex Formulation for Learning from Crowds , 2012, AAAI.