Variational Gaussian process for missing label crowdsourcing classification problems

In this paper we address the crowdsourcing problem, where a classifier must be trained without knowing the real labels. For each sample, labels (which may not be the same) are provided by different annotators (usually with different degrees of expertise). The problem is formulated using Bayesian modeling, and considers scenarios where each annotator may label a subset of the training set samples only. Although Bayesian approaches have been previously proposed in the literature, we introduce Variational Bayes inference to develop an iterative algorithm where all latent variables are automatically estimated. In the experimental section the proposed model is evaluated and compared with other state-of-the-art methods on two real datasets.

[1]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[2]  Panagiotis G. Ipeirotis,et al.  Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.

[3]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[4]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[5]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[6]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[7]  Mark W. Schmidt,et al.  Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.

[8]  Gang Hua,et al.  A Joint Gaussian Process Model for Active Visual Recognition with Expertise Estimation in Crowdsourcing , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[11]  Devavrat Shah,et al.  Efficient crowdsourcing for multi-class labeling , 2013, SIGMETRICS '13.

[12]  Tom Heskes,et al.  Learning from Multiple Annotators with Gaussian Processes , 2011, ICANN.

[13]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[14]  Bernardete Ribeiro,et al.  Gaussian Process Classification and Active Learning with Multiple Annotators , 2014, ICML.

[15]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[16]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[17]  Subramanian Ramanathan,et al.  Learning from multiple annotators with varying expertise , 2013, Machine Learning.

[18]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[19]  Jaime G. Carbonell,et al.  Proactive learning: cost-sensitive active learning with multiple imperfect oracles , 2008, CIKM '08.

[20]  Yee Whye Teh,et al.  Bayesian nonparametric crowdsourcing , 2014, J. Mach. Learn. Res..

[21]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[22]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[23]  Bernardete Ribeiro,et al.  Learning from multiple annotators: Distinguishing good from random labelers , 2013, Pattern Recognit. Lett..