Crowdsourcing High Quality Labels with a Tight Budget

In the past decade, commercial crowdsourcing platforms have revolutionized the ways of classifying and annotating data, especially for large datasets. Obtaining labels for a single instance can be inexpensive, but for large datasets, it is important to allocate budgets wisely. With limited budgets, requesters must trade-off between the quantity of labeled instances and the quality of the final results. Existing budget allocation methods can achieve good quantity but cannot guarantee high quality of individual instances under a tight budget. However, in some scenarios, requesters may be willing to label fewer instances but of higher quality. Moreover, they may have different requirements on quality for different tasks. To address these challenges, we propose a flexible budget allocation framework called Requallo. Requallo allows requesters to set their specific requirements on the labeling quality and maximizes the number of labeled instances that achieve the quality requirement under a tight budget. The budget allocation problem is modeled as a Markov decision process and a sequential labeling policy is produced. The proposed policy greedily searches for the instance to query next as the one that can provide the maximum reward for the goal. The Requallo framework is further extended to consider worker reliability so that the budget can be better allocated. Experiments on two real-world crowdsourcing tasks as well as a simulated task demonstrate that when the budget is tight, the proposed Requallo framework outperforms existing state-of-the-art budget allocation methods from both quantity and quality aspects.

[1]  Xiaoying Gan,et al.  Incentivize crowd labeling under budget constraint , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[2]  Shipeng Yu,et al.  An Entropic Score to Rank Annotators for Crowdsourced Labeling Tasks , 2011, 2011 Third National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics.

[3]  Jing Gao,et al.  Truth Discovery on Crowd Sensing of Correlated Entities , 2015, SenSys.

[4]  Dahui Li,et al.  Task Design, Motivation, and Participation in Crowdsourcing Contests , 2011, Int. J. Electron. Commer..

[5]  Anirban Dasgupta,et al.  Aggregating crowdsourced binary ratings , 2013, WWW.

[6]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[7]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[8]  Jaime G. Carbonell,et al.  A Probabilistic Framework to Learn from Multiple Annotators with Time-Varying Accuracy , 2010, SDM.

[9]  Heng Ji,et al.  FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation , 2015, KDD.

[10]  Pietro Perona,et al.  Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[11]  Andreas Krause,et al.  Truthful incentives in crowdsourcing tasks using regret minimization mechanisms , 2013, WWW.

[12]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[13]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[14]  Hao Wu,et al.  Relationship between quality and payment in crowdsourced design , 2014, Proceedings of the 2014 IEEE 18th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[15]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[16]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[17]  Bo Zhao,et al.  The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing , 2014, WWW.

[18]  Nicholas R. Jennings,et al.  Efficient budget allocation with accuracy guarantees for crowdsourcing classification tasks , 2013, AAMAS.

[19]  Sarvapali D. Ramchurn,et al.  BudgetFix: budget limited crowdsourcing for interdependent task allocation with quality guarantees , 2014, AAMAS.

[20]  Yuandong Tian,et al.  Learning from crowds in the presence of schools of thought , 2012, KDD.

[21]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[22]  Murat Demirbas,et al.  Crowdsourcing for Multiple-Choice Question Answering , 2014, AAAI.

[23]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[24]  Qiang Liu,et al.  Scoring Workers in Crowdsourcing: How Many Control Questions are Enough? , 2013, NIPS.

[25]  A. Agresti [A Survey of Exact Inference for Contingency Tables]: Rejoinder , 1992 .

[26]  Björn Hartmann,et al.  What's the Right Price? Pricing Tasks for Finishing on Time , 2011, Human Computation.

[27]  Klara Nahrstedt,et al.  Quality of Information Aware Incentive Mechanisms for Mobile Crowd Sensing Systems , 2015, MobiHoc.

[28]  Priyanka Agrawal,et al.  Sequential crowdsourced labeling as an epsilon-greedy exploration in a Markov Decision Process , 2014, AISTATS.

[29]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[30]  Andreas Krause,et al.  Near-Optimally Teaching the Crowd to Classify , 2014, ICML.

[31]  Devavrat Shah,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[32]  Judith Redi,et al.  Crowdsourcing for Rating Image Aesthetic Appeal: Better a Paid or a Volunteer Crowd? , 2014, CrowdMM '14.

[33]  Xi Chen,et al.  Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing , 2014, ICML.

[34]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[35]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[36]  Yaron Singer,et al.  Pricing mechanisms for crowdsourcing markets , 2013, WWW.

[37]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[38]  Tom Minka,et al.  How To Grade a Test Without Knowing the Answers - A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing , 2012, ICML.

[39]  Xi Chen,et al.  Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing , 2013, ICML.

[40]  Panagiotis G. Ipeirotis,et al.  Quizz: targeted crowdsourcing with a billion (potential) users , 2014, WWW.

[41]  Shipeng Yu,et al.  Ranking annotators for crowdsourced labeling tasks , 2011, NIPS.

[42]  Mihaela van der Schaar,et al.  Socially-optimal design of crowdsourcing platforms with reputation update errors , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.