Learning From Noisy Singly-labeled Data

Supervised learning depends on annotated examples, which are taken to be the \emph{ground truth}. But these labels often come from noisy crowdsourcing platforms, like Amazon Mechanical Turk. Practitioners typically collect multiple labels per example and aggregate the results to mitigate noise (the classic crowdsourcing problem). Given a fixed annotation budget and unlimited unlabeled data, redundant annotation comes at the expense of fewer labeled examples. This raises two fundamental questions: (1) How can we best learn from noisy workers? (2) How should we allocate our labeling budget to maximize the performance of a classifier? We propose a new algorithm for jointly modeling labels and worker quality from noisy crowd-sourced data. The alternating minimization proceeds in rounds, estimating worker quality from disagreement with the current model and then updating the model by optimizing a loss function that accounts for the current estimate of worker quality. Unlike previous approaches, even with only one annotation per example, our algorithm can estimate worker quality. We establish a generalization error bound for models learned with our algorithm and establish theoretically that it's better to label many examples once (vs less multiply) when worker quality is above a threshold. Experiments conducted on both ImageNet (with simulated noisy workers) and MS-COCO (using the real crowdsourced labels) confirm our algorithm's benefits.

[1]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[2]  Mausam,et al.  Optimal Testing for Crowd Workers , 2016, AAMAS.

[3]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[4]  Panagiotis G. Ipeirotis,et al.  Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.

[5]  Mausam,et al.  To Re(label), or Not To Re(label) , 2014, HCOMP.

[6]  Allan Jabri,et al.  Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.

[7]  Devavrat Shah,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[8]  Jonathan Krause,et al.  Fine-Grained Crowdsourcing for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[10]  Mausam,et al.  Re-Active Learning: Active Learning with Relabeling , 2016, AAAI.

[11]  Matthew S. Nokleby,et al.  Learning Deep Networks from Noisy Labels with Dropout Regularization , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[12]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[13]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[14]  Nihar B. Shah,et al.  Regularized Minimax Conditional Entropy for Crowdsourcing , 2015, ArXiv.

[15]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[16]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[17]  Ashish Khetan,et al.  Achieving budget-optimality with adaptive schemes in crowdsourcing , 2016, NIPS.

[18]  Jonathan Krause,et al.  The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.

[19]  Pietro Perona,et al.  Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[20]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[21]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[22]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[23]  Pietro Perona,et al.  Lean Crowdsourcing: Combining Humans and Machines in an Online System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Anirban Dasgupta,et al.  Aggregating crowdsourced binary ratings , 2013, WWW.

[25]  Pietro Perona,et al.  Multiclass recognition and part localization with humans in the loop , 2011, 2011 International Conference on Computer Vision.

[26]  Geoffrey E. Hinton,et al.  Who Said What: Modeling Individual Labelers Improves Classification , 2017, AAAI.

[27]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[28]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[29]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[30]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[31]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..