Dense limit of the Dawid–Skene model for crowdsourcing and regions of sub-optimality of message passing algorithms

Crowdsourcing is a strategy to categorize data through the contribution of many individuals. A wide range of theoretical and algorithmic contributions are based on the model of Dawid and Skene [1]. Recently it was shown in [2,3] that, in certain regimes, belief propagation is asymptotically optimal for data generated from the Dawid-Skene model. This paper is motivated by this recent progress. We analyze the dense limit of the Dawid-Skene model. It is shown that it belongs to a larger class of low-rank matrix estimation problems for which it is possible to express the asymptotic, Bayes-optimal, performance in a simple closed form. In the dense limit the mapping to a low-rank matrix estimation problem provides an approximate message passing algorithm that solves the problem algorithmically. We identify the regions where the algorithm efficiently computes the Bayes-optimal estimates. Our analysis refines the results of [2,3] about optimality of message passing algorithms by characterizing regions of parameters where these algorithms do not match the Bayes-optimal performance. We further study numerically the performance of approximate message passing, derived in the dense limit, on sparse instances and carry out experiments on a real world dataset.

[1]  R. Palmer,et al.  Solution of 'Solvable model of a spin glass' , 1977 .

[2]  Toshiyuki Tanaka,et al.  Low-rank matrix reconstruction and clustering via approximate message passing , 2013, NIPS.

[3]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[4]  Florent Krzakala,et al.  MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[6]  Adel Javanmard,et al.  State Evolution for General Approximate Message Passing Algorithms, with Applications to Spatial Coupling , 2012, ArXiv.

[7]  E. Bolthausen An Iterative Construction of Solutions of the TAP Equations for the Sherrington–Kirkpatrick Model , 2012, 1201.2891.

[8]  Jinwoo Shin,et al.  Optimal Inference in Crowdsourced Classification via Belief Propagation , 2016, IEEE Transactions on Information Theory.

[9]  Florent Krzakala,et al.  Statistical physics of inference: thresholds and algorithms , 2015, ArXiv.

[10]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[11]  Nicolas Macris,et al.  Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula , 2016, NIPS.

[12]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[13]  Léo Miolane Fundamental limits of low-rank matrix estimation , 2017, 1702.00473.

[14]  S. Kirkpatrick,et al.  Solvable Model of a Spin-Glass , 1975 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Marc Lelarge,et al.  Recovering Asymmetric Communities in the Stochastic Block Model , 2018, IEEE Transactions on Network Science and Engineering.

[17]  Jinwoo Shin,et al.  Optimality of Belief Propagation for Crowdsourced Classification , 2016, ICML.

[18]  Sundeep Rangan,et al.  Iterative estimation of constrained rank-one matrices in noise , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[19]  Andrea Montanari,et al.  Information-theoretically optimal sparse PCA , 2014, 2014 IEEE International Symposium on Information Theory.

[20]  Andrea Montanari,et al.  Asymptotic mutual information for the binary stochastic block model , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[21]  Florent Krzakala,et al.  Constrained low-rank matrix estimation: phase transitions, approximate message passing and applications , 2017, ArXiv.

[22]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.