Evaluating the crowd with confidence

Worker quality control is a crucial aspect of crowdsourcing systems; typically occupying a large fraction of the time and money invested on crowdsourcing. In this work, we devise techniques to generate confidence intervals for worker error rate estimates, thereby enabling a better evaluation of worker quality. We show that our techniques generate correct confidence intervals on a range of real-world datasets, and demonstrate wide applicability by using them to evict poorly performing workers, and provide confidence intervals on the accuracy of the answers.

[1]  Aditya Ramesh Identifying Reliable Workers Swiftly , 2012 .

[2]  D. Massart,et al.  Dealing with missing data: Part II , 2001 .

[3]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[4]  Beng Chin Ooi,et al.  CDAS: A Crowdsourcing Data Analytics System , 2012, Proc. VLDB Endow..

[5]  Larry Wasserman,et al.  All of Statistics , 2004 .

[6]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[7]  Yuandong Tian,et al.  Learning from crowds in the presence of schools of thought , 2012, KDD.

[8]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[9]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[10]  Chris Callison-Burch,et al.  Feasibility of Human-in-the-loop Minimum Error Rate Training , 2009, EMNLP.

[11]  Pietro Perona,et al.  Crowdclustering , 2011, NIPS.

[12]  Jennifer Widom,et al.  CrowdScreen: algorithms for filtering data with humans , 2012, SIGMOD Conference.

[13]  David R. Karger,et al.  Human-powered Sorts and Joins , 2011, Proc. VLDB Endow..

[14]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[15]  Dinei A. F. Florêncio,et al.  Crowdsourcing subjective image quality evaluation , 2011, 2011 18th IEEE International Conference on Image Processing.

[16]  Maya R. Gupta,et al.  Theory and Use of the EM Algorithm , 2011, Found. Trends Signal Process..

[17]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[18]  Aditya G. Parameswaran,et al.  Active sampling for entity matching , 2012, KDD.

[19]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[20]  Aditya G. Parameswaran,et al.  Smart Drill Down , 2014, ArXiv.

[21]  Jennifer Widom,et al.  Human-assisted graph search: it's okay to ask questions , 2011, Proc. VLDB Endow..

[22]  Pietro Perona,et al.  Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[23]  Chris Kanich,et al.  Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context , 2010, USENIX Security Symposium.

[24]  Jaime G. Carbonell,et al.  Efficiently learning the accuracy of labeling sources for selective sampling , 2009, KDD.

[25]  Shipeng Yu,et al.  Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[26]  Aditya G. Parameswaran,et al.  So who won?: dynamic max discovery with the crowd , 2012, SIGMOD Conference.

[27]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[28]  Tim Kraska,et al.  CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[29]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[30]  Omar Alonso,et al.  Crowdsourcing for relevance evaluation , 2008, SIGF.

[31]  Steve Cooper,et al.  Reflections on Stanford's MOOCs , 2013, CACM.

[32]  Devavrat Shah,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .