论文信息 - A Permutation-Based Model for Crowd Labeling: Optimal Estimation and Robustness

A Permutation-Based Model for Crowd Labeling: Optimal Estimation and Robustness

The aggregation and denoising of crowd-labeled data is a task that has gained increased significance with the advent of crowdsourcing platforms and massive datasets. In this paper, we propose a permutation-based model for crowd labeled data that is a significant generalization of the common Dawid-Skene model, and introduce a new error metric by which to compare different estimators. Working in a high-dimensional non-asymptotic framework that allows both the number of workers and tasks to scale, we derive minimax rates of convergence for the permutation-based model that are optimal (up to logarithmic factors). We show that the permutation-based model offers significant robustness in estimation due to its richness, while surprisingly incurring only a small additional statistical penalty as compared to the Dawid-Skene model. We then design a computationally-efficient method, called the OBI-WAN estimator, that is optimal over a class intermediate between the permutation-based and the Dawid-Skene models (up to logarithmic factors), and also simultaneously achieves non-trivial guarantees over the entire permutation-based model class. Finally, we conduct synthetic simulations and experiments on real-world crowdsourcing data, and these corroborate our theoretical findings.

Martin J. Wainwright | Nihar B. Shah | Sivaraman Balakrishnan | M. Wainwright | Sivaraman Balakrishnan

[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2] P. Rigollet,et al. Optimal rates of statistical seriation , 2016, Bernoulli.

[3] W. Feller. Generalization of a probability limit theorem of Cramér , 1943 .

[4] Ricardo Kawase,et al. Training Workers for Improving Performance in Crowdsourcing Microtasks , 2015, EC-TEL.

[5] Nihar B. Shah,et al. Choosing How to Choose Papers , 2018, ArXiv.

[6] Martin J. Wainwright,et al. Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues , 2015, IEEE Transactions on Information Theory.

[7] Jian Peng,et al. Variational Inference for Crowdsourcing , 2012, NIPS.

[8] Nihar B. Shah,et al. Loss Functions, Axioms, and Peer Review , 2018 .

[9] Gerardo Hermosillo,et al. Learning From Crowds , 2010, J. Mach. Learn. Res..

[10] Martin J. Wainwright,et al. Feeling the bern: Adaptive estimators for Bernoulli probabilities of pairwise comparisons , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[11] Arnak S. Dalalyan,et al. A nonasymptotic law of iterated logarithm for general M-estimators , 2019, AISTATS.

[12] A. P. Dawid,et al. Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[13] Nihar B. Shah,et al. Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations in Ratings , 2018, AAMAS.

[14] G. Stewart,et al. Matrix Perturbation Theory , 1990 .

[15] J. Holliday. Sun , 1995 .

[16] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[17] G. W. Stewart,et al. Stochastic Perturbation Theory , 1990, SIAM Rev..

[18] Martin J. Wainwright,et al. Simple, Robust and Optimal Ranking from Pairwise Comparisons , 2015, J. Mach. Learn. Res..

[19] R. Preston McAfee,et al. Who moderates the moderators?: crowdsourcing abuse detection in user-generated content , 2011, EC '11.

[20] A. Bandeira,et al. Sharp nonasymptotic bounds on the norm of random matrices with independent entries , 2014, 1408.6185.

[21] Ashish Khetan,et al. Achieving budget-optimality with adaptive schemes in crowdsourcing , 2016, NIPS.

[22] Nihar B. Shah,et al. Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing , 2014, J. Mach. Learn. Res..

[23] Stefan Dietze,et al. Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys , 2015, CHI.

[24] Devavrat Shah,et al. Budget-optimal crowdsourcing using low-rank matrix approximations , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[25] Cordelia Schmid,et al. A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] A. P. deVries,et al. How Crowdsourcable is Your Task , 2011 .

[27] Martin J. Wainwright,et al. Low Permutation-Rank Matrices: Structural Properties and Noisy Completion , 2017, 2018 IEEE International Symposium on Information Theory (ISIT).

[28] Georgios Dounias,et al. Pap-smear Benchmark Data For Pattern Classification , 2005 .

[29] John C. Platt,et al. Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[30] Ankur Moitra,et al. Better Algorithms for Estimating Non-Parametric Models in Crowd-Sourcing and Rank Aggregation , 2020, COLT.

[31] Bin Bi,et al. Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[32] Kwong-Sak Leung,et al. A Survey of Crowdsourcing Systems , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[33] Nihar B. Shah,et al. Active ranking from pairwise comparisons and when parametric assumptions do not help , 2016, The Annals of Statistics.

[34] Noga Alon,et al. The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[35] Xi Chen,et al. Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[36] Fei-Fei Li,et al. Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[37] Javier R. Movellan,et al. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[38] Ivan Stelmakh,et al. Debiasing Evaluations That are Biased by Evaluations , 2020, AAAI.

[39] Nihar B. Shah,et al. PeerReview4All: Fair and Accurate Reviewer Assignment in Peer Review , 2018, ALT.

[40] Yuval Kluger,et al. Ranking and combining multiple predictors without labeled data , 2013, Proceedings of the National Academy of Sciences.

[41] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42] Chao Gao,et al. Minimax Optimal Convergence Rates for Estimating Ground Truth from Crowdsourced Labels , 2013, 1310.5764.

[43] Panagiotis G. Ipeirotis,et al. Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[44] Nihar B. Shah,et al. On Testing for Biases in Peer Review , 2019, NeurIPS.