Robust Plackett–Luce model for k-ary crowdsourced preferences

The aggregation of k-ary preferences is an emerging ranking problem, which plays an important role in several aspects of our daily life, such as ordinal peer grading and online product recommendation. At the same time, crowdsourcing has become a trendy way to provide a plethora of k-ary preferences for this ranking problem, due to convenient platforms and low costs. However, k-ary preferences from crowdsourced workers are often noisy, which inevitably degenerates the performance of traditional aggregation models. To address this challenge, in this paper, we present a RObust PlAckett–Luce (ROPAL) model. Specifically, to ensure the robustness, ROPAL integrates the Plackett–Luce model with a denoising vector. Based on the Kendall-tau distance, this vector corrects k-ary crowdsourced preferences with a certain probability. In addition, we propose an online Bayesian inference to make ROPAL scalable to large-scale preferences. We conduct comprehensive experiments on simulated and real-world datasets. Empirical results on “massive synthetic” and “real-world” datasets show that ROPAL with online Bayesian inference achieves substantial improvements in robustness and noisy worker detection over current approaches.

[1]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[2]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[3]  Kannan Ramchandran,et al.  A Case for Ordinal Peer-evaluation in MOOCs , 2013 .

[4]  Gabriella Kazai,et al.  Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking , 2011, SIGIR.

[5]  Anke van Zuylen,et al.  Rank Aggregation: Together We're Strong , 2009, ALENEX.

[6]  Lirong Xia,et al.  Learning Mixtures of Plackett-Luce Models , 2016, ICML.

[7]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.

[8]  Elena A. Erosheva,et al.  A Variational EM Method for Mixed Membership Models with Multivariate Rank Data: an Analysis of Public Policy Preferences , 2015 .

[9]  John W. Parks,et al.  How Accurate Is Peer Grading? , 2010, CBE life sciences education.

[10]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[11]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[12]  Amparo Alonso-Betanzos,et al.  A factorization approach to evaluate open-response assignments in MOOCs using preference learning on peer assessments , 2015, Knowl. Based Syst..

[13]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[14]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[15]  Jonathan Krause,et al.  Leveraging the Wisdom of the Crowd for Fine-Grained Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Thorsten Joachims,et al.  Methods for ordinal peer grading , 2014, KDD.

[17]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[18]  Araz Taeihagh,et al.  MOOCs and Crowdsourcing: Massive Courses and Massive Resources , 2015, First Monday.

[19]  R. Luce,et al.  Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[20]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[21]  Chih-Jen Lin,et al.  A Bayesian Approximation Method for Online Ranking , 2011, J. Mach. Learn. Res..

[22]  Zhaohui Zheng,et al.  Learning to model relatedness for news recommendation , 2011, WWW.

[23]  L. Thurstone The method of paired comparisons for social values , 1927 .

[24]  Justin Cheng,et al.  Peer and self assessment in massive online classes , 2013, ACM Trans. Comput. Hum. Interact..

[25]  Luca de Alfaro,et al.  CrowdGrader: a tool for crowdsourcing the evaluation of homework assignments , 2014, SIGCSE.

[26]  Paul N. Bennett,et al.  Pairwise ranking aggregation in a crowdsourced setting , 2013, WSDM.

[27]  Craig Boutilier,et al.  Effective sampling and learning for mallows models with pairwise-preference data , 2014, J. Mach. Learn. Res..

[28]  Albert Maydeu-Olivares,et al.  Thurstonian modeling of ranking data via mean and covariance structure analysis , 1999 .

[29]  Hugo Larochelle,et al.  Learning to rank by aggregating expert preferences , 2012, CIKM.

[30]  Jeroen B. P. Vuurens,et al.  How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy , 2011 .

[31]  Craig Boutilier,et al.  Learning Mallows Models with Pairwise Preferences , 2011, ICML.

[32]  Sven Laur,et al.  Robust rank aggregation for gene list integration and meta-analysis , 2012, Bioinform..

[33]  John Guiver,et al.  Bayesian inference for Plackett-Luce ranking models , 2009, ICML '09.

[34]  DengJia,et al.  Leveraging the Wisdom of the Crowd for Fine-Grained Recognition , 2016 .

[35]  Jinwoo Shin,et al.  Optimality of Belief Propagation for Crowdsourced Classification , 2016, ICML.

[36]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[37]  R. A. Bradley,et al.  Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , 1952 .

[38]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[39]  L. Tardella,et al.  Bayesian Plackett–Luce Mixture Models for Partially Ranked Data , 2015, Psychometrika.

[40]  P. Diaconis Group representations in probability and statistics , 1988 .

[41]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[42]  R. Plackett The Analysis of Permutations , 1975 .

[43]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[44]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .