Millionaire: a hint-guided approach for crowdsourcing

Modern machine learning is migrating to the era of complex models, which requires a plethora of well-annotated data. While crowdsourcing is a promising tool to achieve this goal, existing crowdsourcing approaches barely acquire a sufficient amount of high-quality labels. In this paper, motivated by the “Guess-with-Hints” answer strategy from the Millionaire game show, we introduce the hint-guided approach into crowdsourcing to deal with this challenge. Our approach encourages workers to get help from hints when they are unsure of questions. Specifically, we propose a hybrid-stage setting, consisting of the main stage and the hint stage. When workers face any uncertain question on the main stage, they are allowed to enter the hint stage and look up hints before making any answer. A unique payment mechanism that meets two important design principles for crowdsourcing is developed. Besides, the proposed mechanism further encourages high-quality workers less using hints, which helps identify and assigns larger possible payment to them. Experiments are performed on Amazon Mechanical Turk, which show that our approach ensures a sufficient number of high-quality labels with low expenditure and detects high-quality workers.

[1]  Nihar B. Shah,et al.  Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing , 2014, J. Mach. Learn. Res..

[2]  David M. Pennock,et al.  Bounded Rationality in Wagering Mechanisms , 2016, UAI.

[3]  Jian Li,et al.  CDB: Optimizing Queries with Crowd-Based Selections and Joins , 2017, SIGMOD Conference.

[4]  Ivor W. Tsang,et al.  Robust Plackett–Luce model for k-ary crowdsourced preferences , 2018, Machine Learning.

[5]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[6]  Turk Paul Wais,et al.  Towards Building a High-Quality Workforce with Mechanical , 2010 .

[7]  Mark W. Schmidt,et al.  Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.

[8]  Zhi-Hua Zhou,et al.  Active Learning from Crowds with Unsure Option , 2015, IJCAI.

[9]  Gianluca Demartini,et al.  Mechanical Cheat: Spamming Schemes and Adversarial Techniques on Crowdsourcing Platforms , 2012, CrowdSearch.

[10]  Lu Wang,et al.  Cost-Saving Effect of Crowdsourcing Learning , 2016, IJCAI.

[11]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[12]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[13]  Shipeng Yu,et al.  Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[14]  YanYan,et al.  Learning from multiple annotators with varying expertise , 2014 .

[15]  Xi Chen,et al.  Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing , 2013, ICML.

[16]  Beng Chin Ooi,et al.  iCrowd: An Adaptive Crowdsourcing Framework , 2015, SIGMOD Conference.

[17]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[18]  Guoliang Li,et al.  Crowdsourced Data Management: A Survey , 2016, IEEE Transactions on Knowledge and Data Engineering.

[19]  Qiang Liu,et al.  Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy , 2014, ICML.

[20]  Guoliang Li,et al.  Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[21]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[22]  Zhi-Hua Zhou,et al.  Crowdsourcing with unsure option , 2016, Machine Learning.

[23]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[24]  Aleksandrs Slivkins,et al.  Incentivizing high quality crowdwork , 2015, SECO.

[25]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[26]  K. Koedinger,et al.  Exploring the Assistance Dilemma in Experiments with Cognitive Tutors , 2007 .

[27]  Jeroen B. P. Vuurens,et al.  How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy , 2011 .

[28]  Zhuowen Tu,et al.  Learning to Predict from Crowdsourced Data , 2014, UAI.

[29]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[30]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[31]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Stephen Chong,et al.  Truthful mechanisms for agents that value privacy , 2011, EC.

[33]  Jennifer G. Dy,et al.  Active Learning from Crowds , 2011, ICML.

[34]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[35]  Reynold Cheng,et al.  DOCS: a domain-aware crowdsourcing system using knowledge bases , 2016, VLDB 2016.

[36]  Tian Tian,et al.  Max-Margin Majority Voting for Learning from Crowds , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Leib Litman,et al.  The relationship between motivation, monetary compensation, and data quality among US- and India-based workers on Mechanical Turk , 2014, Behavior Research Methods.

[38]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[39]  Zhifeng Bao,et al.  Crowdsourced POI labelling: Location-aware result inference and Task Assignment , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[40]  Ivor W. Tsang,et al.  On the Convergence of a Family of Robust Losses for Stochastic Gradient Descent , 2016, ECML/PKDD.

[41]  Andreas Krause,et al.  Truthful incentives in crowdsourcing tasks using regret minimization mechanisms , 2013, WWW.

[42]  Reynold Cheng,et al.  QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications , 2015, SIGMOD Conference.

[43]  Dacheng Tao,et al.  Learning with Biased Complementary Labels , 2017, ECCV.

[44]  Subramanian Ramanathan,et al.  Learning from multiple annotators with varying expertise , 2013, Machine Learning.

[45]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[46]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[47]  Kun Zhang,et al.  Transfer Learning with Label Noise , 2017, 1707.09724.

[48]  Nihar B. Shah,et al.  No Oops, You Won't Do It Again: Mechanisms for Self-correction in Crowdsourcing , 2016, ICML.

[49]  Xindong Wu,et al.  Learning from crowdsourced labeled data: a survey , 2016, Artificial Intelligence Review.

[50]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[51]  Gagan Goel,et al.  Mechanism Design for Crowdsourcing Markets with Heterogeneous Tasks , 2014, HCOMP.

[52]  Jonathan A. Smith Qualitative Psychology: A Practical Guide to Research Methods , 2006, QMiP Bulletin.

[53]  Hisashi Kashima,et al.  A Convex Formulation for Learning from Crowds , 2012, AAAI.

[54]  Shao-Yuan Li,et al.  Obtaining High-Quality Label by Distinguishing between Easy and Hard Items in Crowdsourcing , 2017, IJCAI.

[55]  Guoliang Li,et al.  Crowdsourced Data Management: Overview and Challenges , 2017, SIGMOD Conference.

[56]  John Langford,et al.  An axiomatic characterization of wagering mechanisms , 2015, J. Econ. Theory.

[57]  Reynold Cheng,et al.  On Optimality of Jury Selection in Crowdsourcing , 2015, EDBT.

[58]  Aditya G. Parameswaran,et al.  Evaluating the crowd with confidence , 2013, KDD.

[59]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[60]  Matthew Crosby,et al.  Association for the Advancement of Artificial Intelligence , 2014 .

[61]  Hiroshi Kajino,et al.  Convex Formulations of Learning from Crowds , 2012 .

[62]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[63]  Bernardete Ribeiro,et al.  Sequence labeling with multiple annotators , 2013, Machine Learning.

[64]  Andreas Krause,et al.  Incentivizing Users for Balancing Bike Sharing Systems , 2015, AAAI.