论文信息 - Crowdsourcing with Sparsely Interacting Workers

Crowdsourcing with Sparsely Interacting Workers

We consider estimation of worker skills from worker-task interaction data (with unknown labels) for the single-coin crowd-sourcing binary classification model in symmetric noise. We define the (worker) interaction graph whose nodes are workers and an edge between two nodes indicates whether or not the two workers participated in a common task. We show that skills are asymptotically identifiable if and only if an appropriate limiting version of the interaction graph is irreducible and has odd-cycles. We then formulate a weighted rank-one optimization problem to estimate skills based on observations on an irreducible, aperiodic interaction graph. We propose a gradient descent scheme and show that for such interaction graphs estimates converge asymptotically to the global minimum. We characterize noise robustness of the gradient scheme in terms of spectral properties of signless Laplacians of the interaction graph. We then demonstrate that a plug-in estimator based on the estimated skills achieves state-of-art performance on a number of real-world datasets. Our results have implications for rank-one matrix completion problem in that gradient descent can provably recover $W \times W$ rank-one matrices based on $W+1$ off-diagonal observations of a connected graph with a single odd-cycle.

[1] A. P. Dawid,et al. Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[2] S. Nitzan,et al. The characterization of decisive weighted majority rules , 1981 .

[3] Madhav Desai,et al. A characterization of the smallest eigenvalue of a graph , 1994, J. Graph Theory.

[4] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[5] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6] R. Preston McAfee,et al. Who moderates the moderators?: crowdsourcing abuse detection in user-generated content , 2011, EC '11.

[7] Devavrat Shah,et al. Iterative Learning for Reliable Crowdsourcing Systems , 2011, NIPS.

[8] John C. Platt,et al. Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[9] Hongwei Li,et al. Error Rate Bounds in Crowdsourcing Models , 2013, ArXiv.

[10] Anirban Dasgupta,et al. Aggregating crowdsourced binary ratings , 2013, WWW.

[11] Daniel Berend,et al. Consistency of weighted majority votes , 2013, NIPS.

[12] Xi Chen,et al. Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[13] David Szepesvari,et al. A Statistical Analysis of the Aggregation of Crowdsourced Labels , 2015 .