Improving Label Quality in Crowdsourcing Using Noise Correction

This paper proposes a novel framework that introduces noise correction techniques to further improve label quality after ground truth inference in crowdsourcing. In the framework, an adaptive voting noise correction algorithm (AVNC) is proposed to identify and correct the most likely noises with the help of estimated qualities of labelers provided by the ground truth inference. The experimental results on two real-world datasets show that (1) the framework can improve label quality regardless of inference algorithms, especially under the circumstance that each example has a few noisy labels; and (2) since the algorithm AVNC considers both the number of and the probability of potential noises, it outperforms a baseline noise correction algorithm.

[1]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[2]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[3]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[5]  Francisco Herrera,et al.  On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification , 2014, Neurocomputing.

[6]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[7]  Alex Groce,et al.  Mini-crowdsourcing end-user assessment of intelligent assistants: A cost-benefit study , 2011, 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[8]  Dacheng Tao,et al.  Active Learning for Crowdsourcing Using Knowledge Transfer , 2014, AAAI.

[9]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[10]  Nada Lavrac,et al.  Experiments with Noise Filtering in a Medical Domain , 1999, ICML.

[11]  Eibe Frank,et al.  Naive Bayes for Text Classification with Unbalanced Classes , 2006, PKDD.

[12]  Zhuowen Tu,et al.  Learning to Predict from Crowdsourced Data , 2014, UAI.