论文信息 - Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model - 字舞流文

Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model

Non-stationarity appears in many online applications such as web search and advertising. In this paper, we study the online learning to rank problem in a non-stationary environment where user preferences change abruptly at an unknown moment in time. We consider the problem of identifying the K most attractive items and propose cascading non-stationary bandits, an online learning variant of the cascading model, where a user browses a ranked list from top to bottom and clicks on the first attractive item. We propose two algorithms for solving this non-stationary problem: CascadeDUCB and CascadeSWUCB. We analyze their performance and derive gap-dependent upper bounds on the n-step regret of these algorithms. We also establish a lower bound on the regret for cascading non-stationary bandits and show that both algorithms match the lower bound up to a logarithmic factor. Finally, we evaluate their performance on a real-world web search click dataset.

M. de Rijke | Chang Li | Maarten de Rijke | Chang Li

[1] M. de Rijke,et al. Click Models for Web Search , 2015, Click Models for Web Search.

[2] Shie Mannor,et al. Piecewise-stationary bandit problems with side observations , 2009, ICML '09.

[3] Qingyun Wu,et al. Learning Contextual Bandits in a Non-stationary Environment , 2018, SIGIR.

[4] Eli Upfal,et al. Adapting to a Changing Environment: the Brownian Restless Bandits , 2008, COLT.

[5] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[6] Chao Liu,et al. Efficient multiple-click models in web search , 2009, WSDM '09.

[7] Tao Qin,et al. LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[8] Aurélien Garivier,et al. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.

[9] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[10] João Gama,et al. On analyzing user preference dynamics with temporal social networks , 2018, Machine Learning.

[11] M. de Rijke,et al. BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback , 2018, UAI.

[12] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.

[13] Maarten de Rijke,et al. MergeDTS: A Method for Effective Large-Scale Online Ranker Evaluation , 2020, ACM Trans. Inf. Syst..

[14] Olivier Cappé,et al. Multiple-Play Bandits in the Position-Based Model , 2016, NIPS.

[15] Zheng Wen,et al. Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[16] Zheng Wen,et al. Matroid Bandits: Fast Combinatorial Optimization with Learning , 2014, UAI.

[17] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[18] Olivier Chapelle,et al. A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[19] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.

[20] Fang Liu,et al. A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem , 2017, AAAI.

[21] Csaba Szepesvári,et al. Online Learning to Rank in Stochastic Click Models , 2017, ICML.

[22] Omar Besbes,et al. Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.

[23] M. de Rijke,et al. BubbleRank: Safe Online Learning to Rerank , 2018, ArXiv.

[24] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[25] M. de Rijke,et al. When People Change their Mind: Off-Policy Evaluation in Non-stationary Recommendation Environments , 2019, WSDM.

[26] Zheng Wen,et al. Cascading Bandits , 2015, ArXiv.

[27] Zheng Wen,et al. DCM Bandits: Learning to Rank with Multiple Clicks , 2016, ICML.

[28] M. de Rijke,et al. Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.