论文信息 - Answer validation for generic crowdsourcing tasks with minimal efforts

Answer validation for generic crowdsourcing tasks with minimal efforts

Crowdsourcing has been established as an essential means to scale human computation in diverse Web applications, reaching from data integration to information retrieval. Yet, crowd workers have wide-ranging levels of expertise. Large worker populations are heterogeneous and comprise a significant amount of faulty workers. As a consequence, quality insurance for crowd answers is commonly seen as the Achilles heel of crowdsourcing. Although various techniques for quality control have been proposed in recent years, a post-processing phase in which crowd answers are validated is still required. Such validation, however, is typically conducted by experts, whose availability is limited and whose work incurs comparatively high costs. This work aims at guiding an expert in the validation of crowd answers. We present a probabilistic model that helps to identify the most beneficial validation questions in terms of both improvement in result correctness and detection of faulty workers. By seeking expert feedback on the most problematic cases, we are able to obtain a set of high-quality answers, even if the expert does not validate the complete answer set. Our approach is applicable for a broad range of crowdsourcing tasks, including classification and counting. Our comprehensive evaluation using both real-world and synthetic datasets demonstrates that our techniques save up to 60% of expert efforts compared to baseline methods when striving for perfect result correctness. In absolute terms, for most cases, we achieve close to perfect correctness after expert input has been sought for only 15% of the crowdsourcing tasks.

[1] John Riedl,et al. Shilling recommender systems for fun and profit , 2004, WWW '04.

[2] Shipeng Yu,et al. Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[3] Karl Aberer,et al. ERICA: Expert Guidance in Validating Crowd Answers , 2015, SIGIR.

[4] Panagiotis G. Ipeirotis,et al. Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[5] Panagiotis G. Ipeirotis,et al. Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[6] Neil J. Hurley,et al. Collaborative recommendation: A robustness analysis , 2004, TOIT.

[7] Divesh Srivastava,et al. Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[8] Hisashi Kashima,et al. Learning from Crowds and Experts , 2012, HCOMP@AAAI.

[9] Aditya G. Parameswaran,et al. Comprehensive and reliable crowd assessment algorithms , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[10] A. P. Dawid,et al. Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[11] Karl Aberer,et al. Tag-Based Paper Retrieval: Minimizing User Effort with Diversity Awareness , 2015, DASFAA.

[12] Raghav Kaushik,et al. On active learning of record matching packages , 2010, SIGMOD Conference.

[13] Boi Faltings,et al. Rating aggregation in collaborative filtering systems , 2009, RecSys '09.

[14] Andrew McCallum,et al. Scalable probabilistic databases with factor graphs and MCMC , 2010, Proc. VLDB Endow..

[15] Aniket Kittur,et al. Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[16] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[17] Pietro Perona,et al. Crowdclustering , 2011, NIPS.

[18] Karl Aberer,et al. On Leveraging Crowdsourcing Techniques for Schema Matching Networks , 2013, DASFAA.

[19] Devavrat Shah,et al. Iterative Learning for Reliable Crowdsourcing Systems , 2011, NIPS.

[20] Karl Aberer,et al. Argument discovery via crowdsourcing , 2017, The VLDB Journal.

[21] Jeffrey F. Naughton,et al. Corleone: hands-off crowdsourcing for entity matching , 2014, SIGMOD Conference.

[22] Ohad Shamir,et al. Vox Populi: Collecting High-Quality Labels from a Crowd , 2009, COLT.

[23] James Reason,et al. Human Error , 1990 .

[24] Matthew Lease,et al. Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization , 2012, HCOMP@AAAI.

[25] Pietro Perona,et al. Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[26] Xin Dong. Solomon: seeking the truth via copying detection , 2011, BEWEB.

[27] Hao Huang,et al. Learning from Crowds under Experts' Supervision , 2014, PAKDD.

[28] Karl Aberer,et al. BATC: a benchmark for aggregation techniques in crowdsourcing , 2013, SIGIR.

[29] Devavrat Shah,et al. Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[30] Karl Aberer,et al. Pay-as-you-go reconciliation in schema matching networks , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[31] Benjamin B. Bederson,et al. Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[32] Alon Y. Halevy,et al. Pay-as-you-go user feedback for dataspace systems , 2008, SIGMOD Conference.

[33] Christopher Ré,et al. Towards high-throughput gibbs sampling at scale: a study across storage managers , 2013, SIGMOD '13.

[34] Tarek F. Abdelzaher,et al. On truth discovery in social sensing: A maximum likelihood estimation approach , 2012, International Symposium on Information Processing in Sensor Networks.

[35] Shipeng Yu,et al. An Entropic Score to Rank Annotators for Crowdsourced Labeling Tasks , 2011, 2011 Third National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics.

[36] Matthias Weidlich,et al. Retaining Data from Streams of Social Platforms with Minimal Regret , 2017, IJCAI.

[37] AnHai Doan,et al. Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing , 2014, Proc. VLDB Endow..

[38] Charles F. Hockett,et al. A mathematical theory of communication , 1948, MOCO.

[39] Gianluca Demartini,et al. ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[40] Felix Naumann,et al. Data Fusion – Resolving Data Conflicts for Integration , 2009 .

[41] Chris Callison-Burch,et al. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.

[42] Bo Zhao,et al. A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[43] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[44] Abhimanu Kumar. Modeling Annotator Accuracies for Supervised Learning , 2011 .

[45] Matthias Weidlich,et al. Computing Crowd Consensus with Partial Agreement , 2018, IEEE Transactions on Knowledge and Data Engineering.

[46] Vikas Kumar,et al. CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones , 2010, MobiSys '10.

[47] Jeroen B. P. Vuurens,et al. How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy , 2011 .