论文信息 - Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach - 字舞流文

Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach

We study the problem of online rank elicitation, assuming that rankings of a set of alternatives obey the Plackett-Luce distribution. Following the setting of the dueling bandits problem, the learner is allowed to query pairwise comparisons between alternatives, i.e., to sample pairwise marginals of the distribution in an online fashion. Using this information, the learner seeks to reliably predict the most probable ranking (or top-alternative). Our approach is based on constructing a surrogate probability distribution over rankings based on a sorting procedure, for which the pairwise marginals provably coincide with the marginals of the Plackett-Luce distribution. In addition to a formal performance and complexity analysis, we present first experimental studies.

Eyke Hüllermeier | Balázs Szörényi | Róbert Busa-Fekete | Adil Paul | R. Busa-Fekete | E. Hüllermeier | Balázs Szörényi | Adil Paul

[1] Eyke Hüllermeier,et al. PAC Rank Elicitation through Adaptive Sampling of Stochastic Pairwise Preferences , 2014, AAAI.

[2] E. S. Pearson,et al. THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[3] David C. Parkes,et al. Generalized Method-of-Moments for Rank Aggregation , 2013, NIPS.

[4] Sébastien Bubeck,et al. Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[5] Mark Braverman,et al. Sorting from Noisy Information , 2009, ArXiv.

[6] Eyke Hüllermeier,et al. Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows , 2014, ICML.

[7] Thorsten Joachims,et al. The K-armed Dueling Bandits Problem , 2012, COLT.

[8] D. Hunter. MM algorithms for generalized Bradley-Terry models , 2003 .

[9] R. Luce,et al. Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[10] Raphaël Féraud,et al. Generic Exploration and K-armed Voting Bandits , 2013, ICML.

[11] R. Duncan Luce,et al. Individual Choice Behavior: A Theoretical Analysis , 1979 .

[12] Thorsten Joachims,et al. Beat the Mean Bandit , 2011, ICML.

[13] Alessandro Lazaric,et al. Multi-Bandit Best Arm Identification , 2011, NIPS.

[14] C. L. Mallows. NON-NULL RANKING MODELS. I , 1957 .

[15] M. de Rijke,et al. Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.

[16] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[17] Eyke Hüllermeier,et al. A Survey of Preference-Based Online Learning with Bandit Algorithms , 2014, ALT.

[18] Eugene Galanter,et al. Handbook of mathematical psychology: I. , 1963 .

[19] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[20] R. B. Hayward,et al. Large Deviations for Quicksort , 1996, J. Algorithms.

[21] J. Marden. Analyzing and Modeling Rank Data , 1996 .

[22] Eli Upfal,et al. Computing with Noisy Information , 1994, SIAM J. Comput..

[23] John Guiver,et al. Bayesian inference for Plackett-Luce ranking models , 2009, ICML '09.

[24] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[25] Arun Rajkumar,et al. A Statistical Convergence Perspective of Algorithms for Rank Aggregation from Pairwise Data , 2014, ICML.

[26] Nir Ailon,et al. Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking , 2008, NIPS.

[27] R. Plackett. The Analysis of Permutations , 1975 .

[28] Eyke Hüllermeier,et al. Top-k Selection based on Adaptive Sampling of Noisy Preferences , 2013, ICML.

[29] Mark Braverman,et al. Noisy sorting without resampling , 2007, SODA '08.