Answer validation for generic crowdsourcing tasks with minimal efforts

Crowdsourcing has been established as an essential means to scale human computation in diverse Web applications, reaching from data integration to information retrieval. Yet, crowd workers have wide-ranging levels of expertise. Large worker populations are heterogeneous and comprise a significant amount of faulty workers. As a consequence, quality insurance for crowd answers is commonly seen as the Achilles heel of crowdsourcing. Although various techniques for quality control have been proposed in recent years, a post-processing phase in which crowd answers are validated is still required. Such validation, however, is typically conducted by experts, whose availability is limited and whose work incurs comparatively high costs. This work aims at guiding an expert in the validation of crowd answers. We present a probabilistic model that helps to identify the most beneficial validation questions in terms of both improvement in result correctness and detection of faulty workers. By seeking expert feedback on the most problematic cases, we are able to obtain a set of high-quality answers, even if the expert does not validate the complete answer set. Our approach is applicable for a broad range of crowdsourcing tasks, including classification and counting. Our comprehensive evaluation using both real-world and synthetic datasets demonstrates that our techniques save up to 60% of expert efforts compared to baseline methods when striving for perfect result correctness. In absolute terms, for most cases, we achieve close to perfect correctness after expert input has been sought for only 15% of the crowdsourcing tasks.

[1]  John Riedl,et al.  Shilling recommender systems for fun and profit , 2004, WWW '04.

[2]  Shipeng Yu,et al.  Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[3]  Karl Aberer,et al.  ERICA: Expert Guidance in Validating Crowd Answers , 2015, SIGIR.

[4]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[5]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[6]  Neil J. Hurley,et al.  Collaborative recommendation: A robustness analysis , 2004, TOIT.

[7]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[8]  Hisashi Kashima,et al.  Learning from Crowds and Experts , 2012, HCOMP@AAAI.

[9]  Aditya G. Parameswaran,et al.  Comprehensive and reliable crowd assessment algorithms , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[10]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[11]  Karl Aberer,et al.  Tag-Based Paper Retrieval: Minimizing User Effort with Diversity Awareness , 2015, DASFAA.

[12]  Raghav Kaushik,et al.  On active learning of record matching packages , 2010, SIGMOD Conference.

[13]  Boi Faltings,et al.  Rating aggregation in collaborative filtering systems , 2009, RecSys '09.

[14]  Andrew McCallum,et al.  Scalable probabilistic databases with factor graphs and MCMC , 2010, Proc. VLDB Endow..

[15]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[16]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[17]  Pietro Perona,et al.  Crowdclustering , 2011, NIPS.

[18]  Karl Aberer,et al.  On Leveraging Crowdsourcing Techniques for Schema Matching Networks , 2013, DASFAA.

[19]  Devavrat Shah,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2011, NIPS.

[20]  Karl Aberer,et al.  Argument discovery via crowdsourcing , 2017, The VLDB Journal.

[21]  Jeffrey F. Naughton,et al.  Corleone: hands-off crowdsourcing for entity matching , 2014, SIGMOD Conference.

[22]  Ohad Shamir,et al.  Vox Populi: Collecting High-Quality Labels from a Crowd , 2009, COLT.

[23]  James Reason,et al.  Human Error , 1990 .

[24]  Matthew Lease,et al.  Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization , 2012, HCOMP@AAAI.

[25]  Pietro Perona,et al.  Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[26]  Xin Dong Solomon: seeking the truth via copying detection , 2011, BEWEB.

[27]  Hao Huang,et al.  Learning from Crowds under Experts' Supervision , 2014, PAKDD.

[28]  Karl Aberer,et al.  BATC: a benchmark for aggregation techniques in crowdsourcing , 2013, SIGIR.

[29]  Devavrat Shah,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[30]  Karl Aberer,et al.  Pay-as-you-go reconciliation in schema matching networks , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[31]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[32]  Alon Y. Halevy,et al.  Pay-as-you-go user feedback for dataspace systems , 2008, SIGMOD Conference.

[33]  Christopher Ré,et al.  Towards high-throughput gibbs sampling at scale: a study across storage managers , 2013, SIGMOD '13.

[34]  Tarek F. Abdelzaher,et al.  On truth discovery in social sensing: A maximum likelihood estimation approach , 2012, International Symposium on Information Processing in Sensor Networks.

[35]  Shipeng Yu,et al.  An Entropic Score to Rank Annotators for Crowdsourced Labeling Tasks , 2011, 2011 Third National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics.

[36]  Matthias Weidlich,et al.  Retaining Data from Streams of Social Platforms with Minimal Regret , 2017, IJCAI.

[37]  AnHai Doan,et al.  Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing , 2014, Proc. VLDB Endow..

[38]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[39]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[40]  Felix Naumann,et al.  Data Fusion – Resolving Data Conflicts for Integration , 2009 .

[41]  Chris Callison-Burch,et al.  Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.

[42]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[43]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[44]  Abhimanu Kumar Modeling Annotator Accuracies for Supervised Learning , 2011 .

[45]  Matthias Weidlich,et al.  Computing Crowd Consensus with Partial Agreement , 2018, IEEE Transactions on Knowledge and Data Engineering.

[46]  Vikas Kumar,et al.  CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones , 2010, MobiSys '10.

[47]  Jeroen B. P. Vuurens,et al.  How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy , 2011 .

[48]  Björn Hartmann,et al.  Collaboratively crowdsourcing workflows with turkomatic , 2012, CSCW.

[49]  James Surowiecki The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations Doubleday Books. , 2004 .

[50]  Lei Chen,et al.  Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services , 2012, Proc. VLDB Endow..

[51]  Pierre Senellart,et al.  Crowd mining , 2013, SIGMOD '13.

[52]  Karl Aberer,et al.  An Evaluation of Aggregation Techniques in Crowdsourcing , 2013, WISE.

[53]  Dan Roth,et al.  Latent credibility analysis , 2013, WWW.

[54]  Chris Callison-Burch,et al.  Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.

[55]  Rasoul Karimi,et al.  Active Learning for Recommender Systems , 2015, KI - Künstliche Intelligenz.

[56]  Gianluca Demartini,et al.  Mechanical Cheat: Spamming Schemes and Adversarial Techniques on Crowdsourcing Platforms , 2012, CrowdSearch.

[57]  Karl Aberer,et al.  Minimizing Efforts in Validating Crowd Answers , 2015, SIGMOD Conference.

[58]  Gabriella Kazai,et al.  Worker types and personality traits in crowdsourcing relevance labels , 2011, CIKM '11.

[59]  Hinrich Schütze,et al.  Stopping Criteria for Active Learning of Named Entity Recognition , 2008, COLING.

[60]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[61]  Bill Tomlinson,et al.  Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[62]  Ahmed K. Elmagarmid,et al.  Guided data repair , 2011, Proc. VLDB Endow..

[63]  Jennifer Widom,et al.  Surpassing Humans and Computers with JELLYBEAN: Crowd-Vision-Hybrid Counting Algorithms , 2015, HCOMP.

[64]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[65]  Andreas Krause,et al.  Crowd Access Path Optimization: Diversity Matters , 2015, HCOMP.

[66]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[67]  Thomas Hofmann,et al.  Exploiting Document Content for Efficient Aggregation of Crowdsourcing Votes , 2015, CIKM.

[68]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[69]  安藤 寛,et al.  Cross-Validation , 1952, Encyclopedia of Machine Learning and Data Mining.

[70]  Kyumin Lee,et al.  The social honeypot project: protecting online communities from spammers , 2010, WWW '10.

[71]  Lei Chen,et al.  Reducing Uncertainty of Schema Matching via Crowdsourcing , 2013, Proc. VLDB Endow..

[72]  Karl Aberer,et al.  Result selection and summarization for Web Table search , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[73]  H. Sebastian Seung,et al.  A solution to the single-question crowd wisdom problem , 2017, Nature.

[74]  Purnamrita Sarkar,et al.  Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning , 2014, Proc. VLDB Endow..

[75]  Aditya G. Parameswaran,et al.  Crowdsourced Data Management: Industry and Academic Perspectives , 2015, Found. Trends Databases.

[76]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[77]  Shipeng Yu,et al.  Ranking annotators for crowdsourced labeling tasks , 2011, NIPS.