Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction

We consider a crowdsourcing model in which $n$ workers are asked to rate the quality of $n$ items previously generated by other workers. An unknown set of $\alpha n$ workers generate reliable ratings, while the remaining workers may behave arbitrarily and possibly adversarially. The manager of the experiment can also manually evaluate the quality of a small number of items, and wishes to curate together almost all of the high-quality items with at most an $\epsilon$ fraction of low-quality items. Perhaps surprisingly, we show that this is possible with an amount of work required of the manager, and each worker, that does not scale with $n$: the dataset can be curated with $\tilde{O}\Big(\frac{1}{\beta\alpha^3\epsilon^4}\Big)$ ratings per worker, and $\tilde{O}\Big(\frac{1}{\beta\epsilon^2}\Big)$ ratings by the manager, where $\beta$ is the fraction of high-quality items. Our results extend to the more general setting of peer prediction, including peer grading in online classrooms.

[1]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 1999, Random Struct. Algorithms.

[2]  U. Feige,et al.  Finding and certifying a large hidden clique in a semirandom graph , 2000, Random Struct. Algorithms.

[3]  Uriel Feige,et al.  Heuristics for Semirandom Graph Problems , 2001, J. Comput. Syst. Sci..

[4]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[5]  Amin Coja-Oghlan Coloring Semirandom Graphs Optimally , 2004, ICALP.

[6]  Michael Krivelevich,et al.  Semirandom Models as Benchmarks for Coloring Algorithms , 2006, ANALCO.

[7]  John Riedl,et al.  Creating, destroying, and restoring value in wikipedia , 2007, GROUP.

[8]  Paul Resnick,et al.  The influence limiter: provably manipulation-resistant recommender systems , 2007, RecSys '07.

[9]  Amin Coja-Oghlan Solving NP-hard semirandom graph problems in polynomial expected time , 2007, J. Algorithms.

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Paul Resnick,et al.  Eliciting Informative Feedback: The Peer-Prediction Method , 2005, Manag. Sci..

[12]  R. Preston McAfee,et al.  Who moderates the moderators?: crowdsourcing abuse detection in user-generated content , 2011, EC '11.

[13]  Jeroen B. P. Vuurens,et al.  How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy , 2011 .

[14]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Dina Mayzlin,et al.  Promotional Reviews: An Empirical Investigation of Online Review Manipulation , 2012 .

[16]  Elchanan Mossel,et al.  Stochastic Block Models and Reconstruction , 2012 .

[17]  Aravindan Vijayaraghavan,et al.  Approximation algorithms for semi-random partitioning problems , 2012, STOC '12.

[18]  Anirban Dasgupta,et al.  Crowdsourced judgement elicitation with endogenous proficiency , 2013, WWW.

[19]  Zhenghao Chen,et al.  Tuned Models of Peer Assessment in MOOCs , 2013, EDM.

[20]  Justin Cheng,et al.  Peer and self assessment in massive online classes , 2013, ACM Trans. Comput. Hum. Interact..

[21]  Roman Vershynin,et al.  Community detection in sparse networks via Grothendieck’s inequality , 2014, Probability Theory and Related Fields.

[22]  Xiaodong Li,et al.  Robust and Computationally Feasible Community Detection in the Presence of Arbitrary Outlier Nodes , 2014, ArXiv.

[23]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[24]  S. Sanghavi,et al.  Improved Graph Clustering , 2012, IEEE Transactions on Information Theory.

[25]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[26]  Devavrat Shah,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[27]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[28]  Emmanuel Abbe,et al.  Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms , 2015, ArXiv.

[29]  Anup Rao,et al.  Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery , 2015, COLT.

[30]  Yuval Peres,et al.  Approval Voting and Incentives in Crowdsourcing , 2015, ICML.

[31]  Emmanuel Abbe,et al.  Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap , 2015, ArXiv.

[32]  Elchanan Mossel,et al.  Consistency Thresholds for the Planted Bisection Model , 2014, STOC.

[33]  Alexandra Kolla,et al.  Multisection in the Stochastic Block Model using Semidefinite Programming , 2015, ArXiv.

[34]  Kannan Ramchandran,et al.  Truth Serums for Massively Crowdsourced Evaluation Tasks , 2015, ArXiv.

[35]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[36]  Jess Banks,et al.  Information-theoretic thresholds for community detection in sparse networks , 2016, COLT.

[37]  Paul Christiano,et al.  Provably manipulation-resistant reputation systems , 2014, COLT.

[38]  Ankur Moitra,et al.  How robust are reconstruction thresholds for community detection? , 2015, STOC.

[39]  Paul Christiano,et al.  Robust Collaborative Online Learning , 2016, ArXiv.

[40]  Nihar B. Shah,et al.  Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing , 2014, J. Mach. Learn. Res..

[41]  Aravindan Vijayaraghavan,et al.  Learning Communities in the Presence of Errors , 2015, COLT.

[42]  Can M. Le,et al.  Concentration and regularization of random graphs , 2015, Random Struct. Algorithms.

[43]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Comb..