Sloppiness mitigation in crowdsourcing: detecting and correcting bias for crowd scoring tasks

Due to different expertise levels, personal preference, or fatigue from long working of the crowd workers, the data obtained through crowdsourcing are usually unreliable. One big challenge is to obtain true information from such noisy data. Sloppiness, which represents the phenomena of observed labels which fluctuate around the true labels, is one type of the errors that has rarely been discussed in research. Moreover, most existing approaches try to derive truths in binary labeling tasks. In this paper, we deal with the sloppiness in a crowd scoring task, to obtain high-quality estimated labels. Crowd scoring task consists of ordinal and multiple labels, instead of just two labels. The worker in crowdsourcing can exhibit sloppiness, which can lead to unreliable scoring. We show that sloppy workers with biases, who constantly give higher (or lower) answers compared with true labels, can be effectively utilized to improve the quality of the estimated labels. To make use of the labels from crowd workers with biased sloppy behavior, we propose an iterative two-step model to infer the true labels. The first step identifies the biased workers and corrects the biases. The second step uses an optimization-based truth discovery framework to derive true labels from high-quality observed labels and the corrected labels from first step. We also present a hierarchical categorization for different types of crowd workers. Experiments on synthetic data as well as real-world datasets are conducted on the proposed model. The effectiveness of the proposed framework is demonstrated by comparing results with baseline models such as majority voting and expectation maximization-based aggregating algorithm; up to 16% improvement could be obtained for the accuracy.

[1]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[2]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[3]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[4]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[5]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[6]  Cynthia Rudin,et al.  Learning to Predict the Wisdom of Crowds , 2012, ArXiv.

[7]  Gabriella Kazai,et al.  Worker types and personality traits in crowdsourcing relevance labels , 2011, CIKM '11.

[8]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[9]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[10]  穂鷹 良介 Non-Linear Programming の計算法について , 1963 .

[11]  Jing Gao,et al.  Truth Discovery on Crowd Sensing of Correlated Entities , 2015, SenSys.

[12]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[13]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[14]  Jennifer G. Dy,et al.  Active Learning from Crowds , 2011, ICML.

[15]  Devavrat Shah,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2011, NIPS.

[16]  Shipeng Yu,et al.  Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[17]  Jeroen B. P. Vuurens,et al.  How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy , 2011 .

[18]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[19]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[20]  C. Buckley,et al.  Overview of the TREC 2010 Relevance Feedback Track ( Notebook ) , 2010 .

[21]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[22]  Murat Demirbas,et al.  Crowdsourcing for Multiple-Choice Question Answering , 2014, AAAI.

[23]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[24]  Panagiotis G. Ipeirotis,et al.  Quizz: targeted crowdsourcing with a billion (potential) users , 2014, WWW.

[25]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[26]  Chris Callison-Burch,et al.  Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.

[27]  Ohad Shamir,et al.  Vox Populi: Collecting High-Quality Labels from a Crowd , 2009, COLT.

[28]  Eric Horvitz,et al.  Identifying and Accounting for Task-Dependent Bias in Crowdsourcing , 2015, HCOMP.

[29]  A. Rustichini,et al.  Pay Enough or Don't Pay at All , 2000 .

[30]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[31]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2008, IEEE Trans. Knowl. Data Eng..

[32]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[33]  A. Hama Predictably Irrational: The Hidden Forces That Shape Our Decisions , 2010 .

[34]  Bob Carpenter,et al.  The Benefits of a Model of Annotation , 2013, Transactions of the Association for Computational Linguistics.

[35]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[36]  Bo Zhao,et al.  Truth Discovery and Crowdsourcing Aggregation: A Unified Perspective , 2015, Proc. VLDB Endow..

[37]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[38]  Luca de Alfaro,et al.  CrowdGrader: a tool for crowdsourcing the evaluation of homework assignments , 2014, SIGCSE.

[39]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[40]  Michael I. Jordan,et al.  Bayesian Bias Mitigation for Crowdsourcing , 2011, NIPS.

[41]  Victor S. Sheng,et al.  Consensus algorithms for biased labeling in crowdsourcing , 2017, Inf. Sci..