论文信息 - Variational Inference for Crowdsourcing - 字舞流文

Variational Inference for Crowdsourcing

Crowdsourcing has become a popular paradigm for labeling large datasets. However, it has given rise to the computational task of aggregating the crowdsourced labels provided by a collection of unreliable annotators. We approach this problem by transforming it into a standard inference problem in graphical models, and applying approximate variational methods, including belief propagation (BP) and mean field (MF). We show that our BP algorithm generalizes both majority voting and a recent algorithm by Karger et al. [1], while our MF method is closely related to a commonly used EM algorithm. In both cases, we find that the performance of the algorithms critically depends on the choice of a prior distribution on the workers' reliability; by choosing the prior properly, both BP and MF (and EM) perform surprisingly well on both simulated and real-world datasets, competitive with state-of-the-art algorithms based on more complicated modeling assumptions.

Jian Peng | Qiang Liu | Alexander T. Ihler | Jian Peng | Qiang Liu | A. Ihler

[1] A. Zellner. An Introduction to Bayesian Inference in Econometrics , 1971 .

[2] Miss A.O. Penney. (b) , 1974, The New Yale Book of Quotations.

[3] A. P. Dawid,et al. Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[4] Béla Bollobás,et al. Random Graphs , 1985 .

[5] Pietro Perona,et al. Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[6] L. Wasserman,et al. The Selection of Prior Distributions by Formal Rules , 1996 .

[7] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[8] Michael I. Jordan. Graphical Models , 2003 .

[9] William T. Freeman,et al. On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[10] X. Jin. Factor graphs and the Sum-Product Algorithm , 2002 .

[11] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[12] Inc. Alias-i. Multilevel Bayesian Models of Categorical Data Annotation , 2008 .

[13] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[14] K. Mengersen,et al. A Comparison of Bayes–Laplace, Jeffreys, and Other Priors , 2008 .

[15] Javier R. Movellan,et al. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[16] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .

[17] Richard S. Zemel,et al. HOP-MAP: Efficient Message Passing with High Order Potentials , 2010, AISTATS.

[18] Pietro Perona,et al. The Multidimensional Wisdom of Crowds , 2010, NIPS.

[19] Gerardo Hermosillo,et al. Learning From Crowds , 2010, J. Mach. Learn. Res..

[20] A. Asuncion. Approximate Mean Field for Dirichlet-Based Models , 2010 .

[21] Michael I. Jordan,et al. Bayesian Bias Mitigation for Crowdsourcing , 2011, NIPS.

[22] Devavrat Shah,et al. Iterative Learning for Reliable Crowdsourcing Systems , 2011, NIPS.

[23] Shipeng Yu,et al. Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..