Statistical Quality Control for Human Computation and Crowdsourcing

Human computation is a method for solving difficult problems by combining humans and computers. Quality control is a critical issue in human computation because it relies on a large number of participants (i.e., crowds) and there is an uncertainty about their reliability. A solution for this issue is to leverage the power of the “wisdom of crowds”; for example, we can aggregate the outputs of multiple participants or ask a participant to check the output of another participant to improve its quality. In this paper, we review several statistical approaches for controlling the quality of outputs from crowds.

[1]  Hisashi Kashima,et al.  Accurate Integration of Crowdsourced Labels Using Workers' Self-reported Confidence Scores , 2013, IJCAI.

[2]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[3]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[4]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[5]  Jeffrey P. Bigham,et al.  VizWiz: nearly real-time answers to visual questions , 2010, W4A.

[6]  Tom Minka,et al.  How To Grade a Test Without Knowing the Answers - A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing , 2012, ICML.

[7]  Shipeng Yu,et al.  Ranking annotators for crowdsourced labeling tasks , 2011, NIPS.

[8]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[9]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[10]  Roberto Navigli,et al.  International Joint Conference on Artificial Intelligence (IJCAI) , 2011, IJCAI 2011.

[11]  Hisashi Kashima,et al.  Quality Control for Crowdsourced Hierarchical Classification , 2015, 2015 IEEE International Conference on Data Mining.

[12]  Robert L. Grossman,et al.  Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining , 2013, KDD 2013.

[13]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[14]  Hisashi Kashima,et al.  Pairwise HITS: Quality Estimation from Pairwise Comparisons in Creator-Evaluator Crowdsourcing Process , 2017, AAAI.

[15]  Hisashi Kashima,et al.  Crowdsourced data analytics: A case study of a predictive modeling competition , 2014, 2014 International Conference on Data Science and Advanced Analytics (DSAA).

[16]  D. Helbing,et al.  How social influence can undermine the wisdom of crowd effect , 2011, Proceedings of the National Academy of Sciences.

[17]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[18]  Hisashi Kashima,et al.  AdaFlock: Adaptive Feature Discovery for Human-in-the-loop Predictive Modeling , 2018, AAAI.

[19]  Hisashi Kashima,et al.  Hyper Questions: Unsupervised Targeting of a Few Experts in Crowdsourcing , 2017, CIKM.

[20]  Hisashi Kashima,et al.  Simultaneous Clustering and Ranking from Pairwise Comparisons , 2018, IJCAI.

[21]  Mausam,et al.  Crowdsourcing Control: Moving Beyond Multiple Choice , 2012, UAI.

[22]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[23]  Xindong Wu,et al.  Proceedings of the 10th IEEE International Conference on Data Mining (ICDM) , 2010 .

[24]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[25]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[26]  Yong Yu,et al.  Sembler: Ensembling Crowd Sequential Labeling for Improved Quality , 2012, AAAI.

[27]  Masao Utiyama,et al.  Assessing Translation Ability through Vocabulary Ability Assessment , 2016, IJCAI.

[28]  Paul N. Bennett,et al.  Pairwise ranking aggregation in a crowdsourced setting , 2013, WSDM.