Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach

We study the problem of online rank elicitation, assuming that rankings of a set of alternatives obey the Plackett-Luce distribution. Following the setting of the dueling bandits problem, the learner is allowed to query pairwise comparisons between alternatives, i.e., to sample pairwise marginals of the distribution in an online fashion. Using this information, the learner seeks to reliably predict the most probable ranking (or top-alternative). Our approach is based on constructing a surrogate probability distribution over rankings based on a sorting procedure, for which the pairwise marginals provably coincide with the marginals of the Plackett-Luce distribution. In addition to a formal performance and complexity analysis, we present first experimental studies.

[1]  Eyke Hüllermeier,et al.  PAC Rank Elicitation through Adaptive Sampling of Stochastic Pairwise Preferences , 2014, AAAI.

[2]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[3]  David C. Parkes,et al.  Generalized Method-of-Moments for Rank Aggregation , 2013, NIPS.

[4]  Sébastien Bubeck,et al.  Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[5]  Mark Braverman,et al.  Sorting from Noisy Information , 2009, ArXiv.

[6]  Eyke Hüllermeier,et al.  Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows , 2014, ICML.

[7]  Thorsten Joachims,et al.  The K-armed Dueling Bandits Problem , 2012, COLT.

[8]  D. Hunter MM algorithms for generalized Bradley-Terry models , 2003 .

[9]  R. Luce,et al.  Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[10]  Raphaël Féraud,et al.  Generic Exploration and K-armed Voting Bandits , 2013, ICML.

[11]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[12]  Thorsten Joachims,et al.  Beat the Mean Bandit , 2011, ICML.

[13]  Alessandro Lazaric,et al.  Multi-Bandit Best Arm Identification , 2011, NIPS.

[14]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[15]  M. de Rijke,et al.  Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.

[16]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[17]  Eyke Hüllermeier,et al.  A Survey of Preference-Based Online Learning with Bandit Algorithms , 2014, ALT.

[18]  Eugene Galanter,et al.  Handbook of mathematical psychology: I. , 1963 .

[19]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[20]  R. B. Hayward,et al.  Large Deviations for Quicksort , 1996, J. Algorithms.

[21]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[22]  Eli Upfal,et al.  Computing with Noisy Information , 1994, SIAM J. Comput..

[23]  John Guiver,et al.  Bayesian inference for Plackett-Luce ranking models , 2009, ICML '09.

[24]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[25]  Arun Rajkumar,et al.  A Statistical Convergence Perspective of Algorithms for Rank Aggregation from Pairwise Data , 2014, ICML.

[26]  Nir Ailon,et al.  Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking , 2008, NIPS.

[27]  R. Plackett The Analysis of Permutations , 1975 .

[28]  Eyke Hüllermeier,et al.  Top-k Selection based on Adaptive Sampling of Noisy Preferences , 2013, ICML.

[29]  Mark Braverman,et al.  Noisy sorting without resampling , 2007, SODA '08.