Online Top-K Selection in Crowdsourcing Environments

Identifying the top-K items in a set of items is a problem that finds applications in many areas, such as recommender systems, social review platforms, online contests, web search, and more. Crowdsourcing provides an effective way to collect input for such tasks with low costs and has attracted significant attention. We consider top-K problems in which the focus consists in selecting the set of top-K elements, regardless of their internal ordering. Past algorithms for top-K problems were generally based on a global sorting, which perform unnecessary work sorting elements that are all selected or all rejected. The exceptions are specialized crowdsourcing algorithms that have been proposed especially for top-K selection; these algorithms, however, require a fixed amount of work to be performed, and produce no useful intermediate answer. We propose here a dynamic top-K selection algorithm that uses crowdsourced comparisons to progressively classify items into selected in top-K, and rejected. As the comparisons proceed, more and more elements are accepted; intermediate results can be provided at any time by returning the already-accepted items along with the best among the ones that are still unclassified.We show that the algorithm we develop is efficient and robust with respect to comparison noise. We illustrate the performance of algorithm both analytically and experimentally, and show that our algorithm can achieve comparable precision in the selection of top-K items with less crowdsourcing work than previous algorithms.

[1]  D. Hunter MM algorithms for generalized Bradley-Terry models , 2003 .

[2]  Nir Ailon,et al.  An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity , 2010, J. Mach. Learn. Res..

[3]  Soummya Kar,et al.  Convergence Analysis of Distributed Inference with Vector-Valued Gaussian Belief Propagation , 2016, J. Mach. Learn. Res..

[4]  Brian Eriksson,et al.  Learning to Top-K Search using Pairwise Comparisons , 2013, AISTATS.

[5]  Nebojsa Jojic,et al.  Efficient Ranking from Pairwise Comparisons , 2013, ICML.

[6]  Mark Harman,et al.  A survey of the use of crowdsourcing in software engineering , 2017, J. Syst. Softw..

[7]  Xi Chen,et al.  Competitive analysis of the top-K ranking problem , 2016, SODA.

[8]  Vincent Y. F. Tan,et al.  Adversarial Top- $K$ Ranking , 2016, IEEE Transactions on Information Theory.

[9]  Davide Martinenghi,et al.  Crowdsourcing for Top-K Query Processing over Uncertain Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[10]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.

[11]  Minje Jang,et al.  Top-K Ranking from Pairwise Comparisons: When Spectral Ranking is Optimal , 2016, ArXiv.

[12]  Jacob Marschak,et al.  Economic information, decision, and prediction : selected essays , 1974 .

[13]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[14]  Eyke Hüllermeier,et al.  Top-k Selection based on Adaptive Sampling of Noisy Preferences , 2013, ICML.

[15]  Matthias Grossglauser,et al.  Fast and Accurate Inference of Plackett-Luce Models , 2015, NIPS.

[16]  L. Breiman Population theory for boosting ensembles , 2003 .

[17]  Robert D. Nowak,et al.  Active Ranking using Pairwise Comparisons , 2011, NIPS.

[18]  Paul N. Bennett,et al.  Pairwise ranking aggregation in a crowdsourced setting , 2013, WSDM.

[19]  L. R. Ford Solution of a Ranking Problem from Binary Comparisons , 1957 .

[20]  Neoklis Polyzotis,et al.  Efficient Techniques for Crowdsourced Top-k Lists , 2016, IJCAI.

[21]  A. Elo The rating of chessplayers, past and present , 1978 .

[22]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[23]  Yuxin Chen,et al.  Spectral MLE: Top-K Rank Aggregation from Pairwise Comparisons , 2015, ICML.

[24]  S. Matthew Weinberg,et al.  Parallel algorithms for select and partition with noisy comparisons , 2016, STOC.

[25]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .

[26]  Guoliang Li,et al.  Crowdsourced Top-k Algorithms: An Experimental Evaluation , 2016, Proc. VLDB Endow..

[27]  Hajo Hippner,et al.  Crowdsourcing , 2012, Business & Information Systems Engineering.

[28]  Sanjeev Khanna,et al.  Using the crowd for top-k and group-by queries , 2013, ICDT '13.

[29]  Neoklis Polyzotis,et al.  Max algorithms in crowdsourcing environments , 2012, WWW.

[30]  N. Wilcox,et al.  Decisions, Error and Heterogeneity , 1997 .

[31]  Devavrat Shah,et al.  Iterative ranking from pair-wise comparisons , 2012, NIPS.

[32]  Nihar B. Shah,et al.  Active ranking from pairwise comparisons and when parametric assumptions do not help , 2016, The Annals of Statistics.

[33]  D. Davidson Experimental Tests of a Stochastic Decision Theory (1959) , 1974 .

[34]  Thomas Hofmann,et al.  TrueSkill™: A Bayesian Skill Rating System , 2007 .

[35]  Arun Rajkumar,et al.  A Statistical Convergence Perspective of Algorithms for Rank Aggregation from Pairwise Data , 2014, ICML.

[36]  Devavrat Shah,et al.  Rank Centrality: Ranking from Pairwise Comparisons , 2012, Oper. Res..

[37]  Martin J. Wainwright,et al.  Simple, Robust and Optimal Ranking from Pairwise Comparisons , 2015, J. Mach. Learn. Res..

[38]  Arpit Agarwal,et al.  Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons , 2017, COLT.

[39]  J. Marschak,et al.  Experimental Tests of Stochastic Decision Theory , 1957 .

[40]  Neoklis Polyzotis,et al.  Human-Powered Top-k Lists , 2013, WebDB.