TopRank: A practical algorithm for online stochastic ranking

Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user. Many sample-efficient algorithms have been proposed for this problem that assume a specific click model connecting rankings and user behavior. We propose a generalized click model that encompasses many existing models, including the position-based and cascade models. Our generalization motivates a novel online learning algorithm based on topological sort, which we call TopRank. TopRank is (a) more natural than existing algorithms, (b) has stronger regret guarantees than existing algorithms with comparable generality, (c) has a more insightful proof that leaves the door open to many generalizations, (d) outperforms existing algorithms empirically.

[1]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[2]  T. Lai,et al.  Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[3]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[4]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[5]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[6]  Atsuyoshi Nakamura,et al.  Algorithms for Adversarial Bandit Problems with Multiple Plays , 2010, ALT.

[7]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[8]  Filip Radlinski,et al.  Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..

[9]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[10]  Zheng Wen,et al.  Combinatorial Cascading Bandits , 2015, NIPS.

[11]  M. de Rijke,et al.  A Comparative Study of Click Models for Web Search , 2015, CLEF.

[12]  Alexandre Proutière,et al.  Learning to Rank , 2015, SIGMETRICS.

[13]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[14]  M. de Rijke,et al.  Click-based Hot Fixes for Underperforming Torso Queries , 2016, SIGIR.

[15]  Shuai Li,et al.  Contextual Combinatorial Cascading Bandits , 2016, ICML.

[16]  Zheng Wen,et al.  DCM Bandits: Learning to Rank with Multiple Clicks , 2016, ICML.

[17]  Olivier Cappé,et al.  Multiple-Play Bandits in the Position-Based Model , 2016, NIPS.

[18]  Zheng Wen,et al.  Cascading Bandits for Large-Scale Recommendation Problems , 2016, UAI.

[19]  Tor Lattimore,et al.  The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits , 2016, AISTATS.

[20]  Csaba Szepesvári,et al.  Online Learning to Rank in Stochastic Click Models , 2017, ICML.

[21]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.