Parametric Graph for Unimodal Ranking Bandit

We tackle the online ranking problem of assigning L items to K positions on a web page in order to maximize the number of user clicks. We propose an original algorithm, easy to implement and with strong theoretical guarantees to tackle this problem in the Position-Based Model (PBM) setting, well suited for applications where items are displayed on a grid. Besides learning to rank, our algorithm, GRAB (for parametric Graph for unimodal RAnking Bandit), also learns the parameter of a compact graph over permutations of K items among L. The logarithmic regret bound of this algorithm is a direct consequence of the unimodality property of the bandit setting with respect to the learned graph. Experiments against state-ofthe-art learning algorithms which also tackle the PBM setting, show that our method is more efficient while giving regret performance on par with the best known algorithms on simulated and real life datasets.

[1]  Olivier Cappé,et al.  Multiple-Play Bandits in the Position-Based Model , 2016, NIPS.

[2]  Richard Combes,et al.  Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness , 2020, Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems.

[3]  Romaric Gaudel,et al.  Position-Based Multiple-Play Bandits with Thompson Sampling , 2020, ArXiv.

[4]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[5]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[6]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[7]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[8]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[9]  Alexandre Proutière,et al.  Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms , 2014, ICML.

[10]  Wtt Wtt Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[11]  Alexandre Proutière,et al.  Learning to Rank , 2015, SIGMETRICS.

[12]  Hiroshi Nakagawa,et al.  Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays , 2015, ICML.

[13]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[14]  Akiko Takeda,et al.  Position-based Multiple-play Bandit Problem with Unknown Position Bias , 2017, NIPS.

[15]  Shuai Li,et al.  TopRank: A practical algorithm for online stochastic ranking , 2018, NeurIPS.

[16]  Robert E. Tarjan,et al.  On Minimum-Cost Assignments in Unbalanced Bipartite Graphs , 2012 .

[17]  M. de Rijke,et al.  BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback , 2018, UAI.