A Permutation-Based Model for Crowd Labeling: Optimal Estimation and Robustness

The aggregation and denoising of crowd-labeled data is a task that has gained increased significance with the advent of crowdsourcing platforms and massive datasets. In this paper, we propose a permutation-based model for crowd labeled data that is a significant generalization of the common Dawid-Skene model, and introduce a new error metric by which to compare different estimators. Working in a high-dimensional non-asymptotic framework that allows both the number of workers and tasks to scale, we derive minimax rates of convergence for the permutation-based model that are optimal (up to logarithmic factors). We show that the permutation-based model offers significant robustness in estimation due to its richness, while surprisingly incurring only a small additional statistical penalty as compared to the Dawid-Skene model. We then design a computationally-efficient method, called the OBI-WAN estimator, that is optimal over a class intermediate between the permutation-based and the Dawid-Skene models (up to logarithmic factors), and also simultaneously achieves non-trivial guarantees over the entire permutation-based model class. Finally, we conduct synthetic simulations and experiments on real-world crowdsourcing data, and these corroborate our theoretical findings.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  P. Rigollet,et al.  Optimal rates of statistical seriation , 2016, Bernoulli.

[3]  W. Feller Generalization of a probability limit theorem of Cramér , 1943 .

[4]  Ricardo Kawase,et al.  Training Workers for Improving Performance in Crowdsourcing Microtasks , 2015, EC-TEL.

[5]  Nihar B. Shah,et al.  Choosing How to Choose Papers , 2018, ArXiv.

[6]  Martin J. Wainwright,et al.  Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues , 2015, IEEE Transactions on Information Theory.

[7]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[8]  Nihar B. Shah,et al.  Loss Functions, Axioms, and Peer Review , 2018 .

[9]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[10]  Martin J. Wainwright,et al.  Feeling the bern: Adaptive estimators for Bernoulli probabilities of pairwise comparisons , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[11]  Arnak S. Dalalyan,et al.  A nonasymptotic law of iterated logarithm for general M-estimators , 2019, AISTATS.

[12]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[13]  Nihar B. Shah,et al.  Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations in Ratings , 2018, AAMAS.

[14]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[15]  J. Holliday Sun , 1995 .

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  G. W. Stewart,et al.  Stochastic Perturbation Theory , 1990, SIAM Rev..

[18]  Martin J. Wainwright,et al.  Simple, Robust and Optimal Ranking from Pairwise Comparisons , 2015, J. Mach. Learn. Res..

[19]  R. Preston McAfee,et al.  Who moderates the moderators?: crowdsourcing abuse detection in user-generated content , 2011, EC '11.

[20]  A. Bandeira,et al.  Sharp nonasymptotic bounds on the norm of random matrices with independent entries , 2014, 1408.6185.

[21]  Ashish Khetan,et al.  Achieving budget-optimality with adaptive schemes in crowdsourcing , 2016, NIPS.

[22]  Nihar B. Shah,et al.  Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing , 2014, J. Mach. Learn. Res..

[23]  Stefan Dietze,et al.  Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys , 2015, CHI.

[24]  Devavrat Shah,et al.  Budget-optimal crowdsourcing using low-rank matrix approximations , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[25]  Cordelia Schmid,et al.  A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  A. P. deVries,et al.  How Crowdsourcable is Your Task , 2011 .

[27]  Martin J. Wainwright,et al.  Low Permutation-Rank Matrices: Structural Properties and Noisy Completion , 2017, 2018 IEEE International Symposium on Information Theory (ISIT).

[28]  Georgios Dounias,et al.  Pap-smear Benchmark Data For Pattern Classification , 2005 .

[29]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[30]  Ankur Moitra,et al.  Better Algorithms for Estimating Non-Parametric Models in Crowd-Sourcing and Rank Aggregation , 2020, COLT.

[31]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[32]  Kwong-Sak Leung,et al.  A Survey of Crowdsourcing Systems , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[33]  Nihar B. Shah,et al.  Active ranking from pairwise comparisons and when parametric assumptions do not help , 2016, The Annals of Statistics.

[34]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[35]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[36]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[37]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[38]  Ivan Stelmakh,et al.  Debiasing Evaluations That are Biased by Evaluations , 2020, AAAI.

[39]  Nihar B. Shah,et al.  PeerReview4All: Fair and Accurate Reviewer Assignment in Peer Review , 2018, ALT.

[40]  Yuval Kluger,et al.  Ranking and combining multiple predictors without labeled data , 2013, Proceedings of the National Academy of Sciences.

[41]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Chao Gao,et al.  Minimax Optimal Convergence Rates for Estimating Ground Truth from Crowdsourced Labels , 2013, 1310.5764.

[43]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[44]  Nihar B. Shah,et al.  On Testing for Biases in Peer Review , 2019, NeurIPS.

[45]  Ashish Khetan,et al.  Reliable Crowdsourcing under the Generalized Dawid-Skene Model , 2016, ArXiv.

[46]  E. Gilbert A comparison of signalling alphabets , 1952 .

[47]  E. Rio,et al.  Concentration around the mean for maxima of empirical processes , 2005, math/0506594.

[48]  Qiyang Han,et al.  Global empirical risk minimizers with "shape constraints" are rate optimal in general dimensions , 2019, 1905.12823.

[49]  Martin J. Wainwright,et al.  Breaking the 1/√n Barrier: Faster Rates for Permutation-based Models in Polynomial Time , 2018, COLT.

[50]  Yuval Kluger,et al.  Estimating the accuracies of multiple classifiers without labeled data , 2014, AISTATS.

[51]  Chao Gao,et al.  Exact Exponent in Optimal Rates for Crowdsourcing , 2016, ICML.

[52]  Nihar B. Shah,et al.  Regularized Minimax Conditional Entropy for Crowdsourcing , 2015, ArXiv.

[53]  Anirban Dasgupta,et al.  Aggregating crowdsourced binary ratings , 2013, WWW.