Exploiting Structure in Crowdsourcing Tasks via Latent Factor Models

Internet crowdsourcing services such as the Amazon Mechanical Turk (1) and the ESP Game (15) have become important tools for the machine learning community by facilitating the distributed labeling of large datasets at little cost. A key challenge when using crowdsourcing to label databases is the need to derive high quality labels by aggregating the responses from labelers of varying reliability over data instances of varying difficulty. Existing algorithms for quality control and label inference (14; 17; 10) suffer several significant shortcomings: (1) Existing methods are incapable of modeling interaction effects between labeler and data items, such as when some labelers have specialized knowledge about a particular subset of items. (2) Existing algorithms assume that labelers’ accuracies, as well as data instances’ difficulties, are independent. In reality, there may be a priori information about labelers (or data instances) that predicts those labelers’ accuracy at the labeling task. Analogously, certain features shared among data instances may predict their difficulty of being labeled correctly. In this paper, we present an algorithm that addresses both of these shortcomings. We demonstrate that the proposed algorithm delivers superior accuracy, compared to previous methods, of inferring data labels on a difficult facial expression labeling task. Finally, we show that our proposed model subsumes certain previous models as special cases.

[1]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[2]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[3]  Gwen Littlewort,et al.  Toward Practical Smile Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Ohad Shamir,et al.  Good learners for evil teachers , 2009, ICML '09.

[5]  C. M. Reeves,et al.  Function minimization by conjugate gradients , 1964, Comput. J..

[6]  Simon Rogers,et al.  Semi-parametric analysis of multi-rater data , 2010, Stat. Comput..

[7]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[8]  Gert R. G. Lanckriet,et al.  User-centered design of a social game to tag music , 2009, HCOMP '09.

[9]  Valen E. Johnson,et al.  On Bayesian Analysis of Multirater Ordinal Data: An Application to Automated Essay Grading , 1996 .

[10]  W. Batchelder,et al.  Markov chain estimation for test theory without an answer key , 2003 .

[11]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[14]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[15]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .