More for less: adaptive labeling payments in online labor markets

In many predictive tasks where human intelligence is needed to label training instances, online crowdsourcing markets have emerged as promising platforms for large-scale, cost-effective labeling. However, these platforms also introduce significant challenges that must be addressed in order for these opportunities to materialize. In particular, it has been shown that different trade-offs between payment offered to labelers and the quality of labeling arise at different times, possibly as a result of different market conditions and even the nature of the tasks themselves. Because the underlying mechanism giving rise to different trade-offs is not well understood, for any given labeling task and at any given time, it is not known which labeling payments to offer in the market so as to produce accurate models cost-effectively. Importantly, because in these markets the acquired labels are not always correct, determining the expected effect of labels acquired at any given payment on the improvement in model performance is particularly challenging. Effective and robust methods for dealing with these challenges are essential to enable a growing reliance on these promising and increasingly popular labor markets for large-scale labeling. In this paper, we first present this new problem of Adaptive Labeling Payment (ALP): how to learn and sequentially adapt the payment offered to crowd labelers before they undertake a labeling task, so as to produce a given predictive performance cost-effectively. We then develop an ALP approach and discuss the key challenges it aims to address so as to yield consistently good performance. We evaluate our approach extensively over a wide variety of market conditions. Our results demonstrate that the ALP method we propose yields significant cost savings and robust performance across different settings. As such, our ALP approach can be used as a benchmark for future mechanisms to determine cost-effective selection of labeling payments.

[1]  J. Carbonell,et al.  Adaptive Proactive Learning with Cost-Reliability Tradeoff , 2009 .

[2]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[3]  John J. Horton,et al.  Research Note - Are Online Labor Markets Spot Markets for Tasks? A Field Experiment on the Behavioral Response to Wage Cuts , 2016, Inf. Syst. Res..

[4]  Xindong Wu,et al.  Learning from crowdsourced labeled data: a survey , 2016, Artificial Intelligence Review.

[5]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[6]  Anirban Dasgupta,et al.  Aggregating crowdsourced binary ratings , 2013, WWW.

[7]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[8]  Michael I. Jordan,et al.  Bayesian Bias Mitigation for Crowdsourcing , 2011, NIPS.

[9]  Bernardete Ribeiro,et al.  Learning from multiple annotators: Distinguishing good from random labelers , 2013, Pattern Recognit. Lett..

[10]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[11]  Maytal Saar-Tsechansky,et al.  Collaborative information acquisition for data-driven decisions , 2013, Machine Learning.

[12]  Xindong Wu,et al.  Active Learning With Imbalanced Multiple Noisy Labeling , 2015, IEEE Transactions on Cybernetics.

[13]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[14]  Mausam,et al.  To Re(label), or Not To Re(label) , 2014, HCOMP.

[15]  Foster J. Provost,et al.  Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.

[16]  Gabriella Kazai,et al.  An analysis of human factors and label accuracy in crowdsourcing relevance judgments , 2013, Information Retrieval.

[17]  Mausam,et al.  Re-Active Learning: Active Learning with Relabeling , 2016, AAAI.

[18]  Peng Dai,et al.  POMDP-based control of workflows for crowdsourcing , 2013, Artif. Intell..

[19]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[20]  Panagiotis G. Ipeirotis,et al.  Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.

[21]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[22]  Anne Jumonville,et al.  Encyclopedia of the Sciences of Learning , 2013 .

[23]  Lorrie Faith Cranor,et al.  Are your participants gaming the system?: screening mechanical turk workers , 2010, CHI.

[24]  Jing Wang,et al.  Cost-Effective Quality Assurance in Crowd Labeling , 2016, Inf. Syst. Res..

[25]  Mausam,et al.  Crowdsourcing Control: Moving Beyond Multiple Choice , 2012, UAI.

[26]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[27]  Panagiotis G. Ipeirotis,et al.  Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.

[28]  Devavrat Shah,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[29]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[30]  Tobias Scheffer,et al.  International Conference on Machine Learning (ICML-99) , 1999, Künstliche Intell..

[31]  Gabriella Kazai,et al.  In Search of Quality in Crowdsourcing for Search Engine Evaluation , 2011, ECIR.

[32]  Aniket Kittur,et al.  An Assessment of Intrinsic and Extrinsic Motivation on Task Performance in Crowdsourcing Markets , 2011, ICWSM.

[33]  Donghui Feng,et al.  Acquiring High Quality Non-Expert Knowledge from On-Demand Workforce , 2009, PWNLP@IJCNLP.

[34]  John Shawe-Taylor,et al.  Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain , 2011, NIPS.

[35]  Abhimanu Kumar Modeling Annotator Accuracies for Supervised Learning , 2011 .