A unified statistical framework for crowd labeling

Recently, there has been a burst in the number of research projects on human computation via crowdsourcing. Multiple-choice (or labeling) questions could be referred to as a common type of problem which is solved by this approach. As an application, crowd labeling is applied to find true labels for large machine learning datasets. Since crowds are not necessarily experts, the labels they provide are rather noisy and erroneous. This challenge is usually resolved by collecting multiple labels for each sample and then aggregating them to estimate the true label. Although the mechanism leads to high-quality labels, it is not actually cost-effective. As a result, efforts are currently made to maximize the accuracy in estimating true labels, while fixing the number of acquired labels.This paper surveys methods to aggregate redundant crowd labels in order to estimate unknown true labels. It presents a unified statistical latent model where the differences among popular methods in the field correspond to different choices for the parameters of the model. Afterward, algorithms to make inference on these models will be surveyed. Moreover, adaptive methods which iteratively collect labels based on the previously collected labels and estimated models will be discussed. In addition, this paper compares the distinguished methods and provides guidelines for future work required to address the current open issues.

[1]  Ohad Shamir,et al.  Good learners for evil teachers , 2009, ICML '09.

[2]  Jeroen H. M. Janssens Ranking Images on Semantic Attributes using Human Computation , 2010 .

[3]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[4]  Turk Paul Wais,et al.  Towards Building a High-Quality Workforce with Mechanical , 2010 .

[5]  Mark W. Schmidt,et al.  Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.

[6]  Lydia B. Chilton,et al.  TurKit: Tools for iterative tasks on mechanical turk , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[7]  Chao Liu,et al.  TrueLabel + Confusions: A Spectrum of Probabilistic Models in Analyzing Multiple Ratings , 2012, ICML.

[8]  Chien-Ju Ho,et al.  Online Task Assignment in Crowdsourcing Markets , 2012, AAAI.

[9]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[10]  Peng Dai,et al.  Artificial Intelligence for Artificial Artificial Intelligence , 2011, AAAI.

[11]  Hisashi Kashima,et al.  Learning from Crowds and Experts , 2012, HCOMP@AAAI.

[12]  Matthew Lease,et al.  Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization , 2012, HCOMP@AAAI.

[13]  David L. Waltz Evolution, Sociobiology, and the Future of Artificial Intelligence , 2006, IEEE Intelligent Systems.

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  Matthew Lease,et al.  Crowdsourcing Document Relevance Assessment with Mechanical Turk , 2010, Mturk@HLT-NAACL.

[16]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[17]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[18]  Jeff Howe,et al.  Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business , 2008, Human Resource Management International Digest.

[19]  Bin Li,et al.  A survey on instance selection for active learning , 2012, Knowledge and Information Systems.

[20]  Ohad Shamir,et al.  Vox Populi: Collecting High-Quality Labels from a Crowd , 2009, COLT.

[21]  Hiroshi Kajino,et al.  Convex Formulations of Learning from Crowds , 2012 .

[22]  Joseph P. Campbell,et al.  Assessing the speaker recognition performance of naive listeners using mechanical turk , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[24]  Devavrat Shah,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[25]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[26]  ShahDevavrat,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2014 .

[27]  Javier R. Movellan,et al.  Exploiting Structure in Crowdsourcing Tasks via Latent Factor Models , 2012 .

[28]  C. Lintott,et al.  Galaxy Zoo: Exploring the Motivations of Citizen Science Volunteers. , 2009, 0909.2925.

[29]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[30]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[31]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[32]  Panagiotis G. Ipeirotis,et al.  Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.

[33]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[34]  M. Tan,et al.  Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. , 1996, Biometrics.

[35]  Michael S. Bernstein,et al.  Analytic Methods for Optimizing Realtime Crowdsourcing , 2012, ArXiv.

[36]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[37]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[38]  Ming Yan,et al.  Exact Low-Rank Matrix Completion from Sparsely Corrupted Entries Via Adaptive Outlier Pursuit , 2013, J. Sci. Comput..

[39]  Michela Bertolotto,et al.  Geographic knowledge extraction and semantic similarity in OpenStreetMap , 2013, Knowledge and Information Systems.

[40]  Andrew Potter,et al.  Mass collaboration problem solving: A new approach to wicked problems , 2010, 2010 International Symposium on Collaborative Technologies and Systems.

[41]  Hisashi Kashima,et al.  A Convex Formulation for Learning from Crowds , 2012, AAAI.

[42]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[43]  Devavrat Shah,et al.  Budget-optimal crowdsourcing using low-rank matrix approximations , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[44]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[45]  Foster J. Provost,et al.  Online active inference and learning , 2011, KDD.

[46]  Tom Minka,et al.  How To Grade a Test Without Knowing the Answers - A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing , 2012, ICML.

[47]  Shipeng Yu,et al.  Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[48]  Xindong Wu,et al.  Self-Taught Active Learning from Crowds , 2012, 2012 IEEE 12th International Conference on Data Mining.

[49]  Xi Chen,et al.  Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing , 2013, ICML.

[50]  Matthias Seeger,et al.  Bayesian Modelling in Machine Learning: A Tutorial Review , 2006 .

[51]  Michael W. Berry,et al.  Large-Scale Sparse Singular Value Computations , 1992 .

[52]  P. Albert,et al.  A Cautionary Note on the Robustness of Latent Class Models for Estimating Diagnostic Error without a Gold Standard , 2004, Biometrics.

[53]  Lydia B. Chilton,et al.  Exploring iterative and parallel human computation processes , 2010, HCOMP '10.

[54]  Hubert L. Dreyfus,et al.  Mind over Machine: The Power of Human Intuition and Expertise in the Era of the Computer , 1987, IEEE Expert.

[55]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[56]  Chien-Ju Ho,et al.  Adaptive Task Assignment for Crowdsourced Classification , 2013, ICML.

[57]  Matthew Lease,et al.  Inferring missing relevance judgments from crowd workers via probabilistic matrix factorization , 2012, SIGIR '12.

[58]  Ben Carterette,et al.  An Analysis of Assessor Behavior in Crowdsourced Preference Judgments , 2010 .

[59]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[60]  Ted S. Sindlinger,et al.  Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business , 2010 .

[61]  Eibe Frank,et al.  A Simple Approach to Ordinal Classification , 2001, ECML.

[62]  Arnold Zellner,et al.  An Introduction to Bayesian Inference in Econometrics. , 1974 .

[63]  Lydia B. Chilton,et al.  Task search in a human computation market , 2010, HCOMP '10.

[64]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[65]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[66]  Nathan Eagle,et al.  txteagle: Mobile Crowdsourcing , 2009, HCI.

[67]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[68]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[69]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[70]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[71]  John McCarthy,et al.  From Here to Human-Level AI , 1996, KR.

[72]  Ulrich Paquet,et al.  Vuvuzelas & Active Learning for Online Classification , 2010 .

[73]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[74]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.