Online Learning to Rank in Stochastic Click Models

Online learning to rank is a core problem in information retrieval and machine learning. Many provably efficient algorithms have been recently proposed for this problem in specific click models. The click model is a model of how the user interacts with a list of documents. Though these results are significant, their impact on practice is limited, because all proposed algorithms are designed for specific click models and lack convergence guarantees in other models. In this work, we propose BatchRank, the first online learning to rank algorithm for a broad class of click models. The class encompasses two most fundamental click models, the cascade and position-based models. We derive a gap-dependent upper bound on the T-step regret of BatchRank and evaluate it on a range of web search queries. We observe that BatchRank outperforms ranked bandits and is more robust than CascadeKL-UCB, an existing algorithm for the cascade model.

[1]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[3]  D. Teneketzis,et al.  Asymptotically Efficient Adaptive Allocation Schemes for Controlled I.I.D. Processes: Finite Paramet , 1988 .

[4]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[5]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[6]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[7]  David Maxwell Chickering,et al.  Modeling Contextual Factors of Click Rates , 2007, AAAI.

[8]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[9]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[10]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[11]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[12]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[13]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[14]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[15]  Chao Liu,et al.  Click chain model in web search , 2009, WWW '09.

[16]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[17]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[18]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[19]  Csaba Szepesvári,et al.  Partial Monitoring with Side Information , 2012, ALT.

[20]  Csaba Szepesvári,et al.  An adaptive algorithm for finite stochastic partial monitoring , 2012, ICML.

[21]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[22]  Filip Radlinski,et al.  Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..

[23]  Csaba Szepesvári,et al.  Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..

[24]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[25]  Zheng Wen,et al.  Combinatorial Cascading Bandits , 2015, NIPS.

[26]  M. de Rijke,et al.  A Comparative Study of Click Models for Web Search , 2015, CLEF.

[27]  Alexandre Proutière,et al.  Learning to Rank , 2015, SIGMETRICS.

[28]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[29]  M. de Rijke,et al.  Click-based Hot Fixes for Underperforming Torso Queries , 2016, SIGIR.

[30]  Shuai Li,et al.  Contextual Combinatorial Cascading Bandits , 2016, ICML.

[31]  Zheng Wen,et al.  DCM Bandits: Learning to Rank with Multiple Clicks , 2016, ICML.

[32]  Olivier Cappé,et al.  Multiple-Play Bandits in the Position-Based Model , 2016, NIPS.

[33]  Zheng Wen,et al.  Cascading Bandits for Large-Scale Recommendation Problems , 2016, UAI.

[34]  Zheng Wen,et al.  Bernoulli Rank-1 Bandits for Click Feedback , 2017, IJCAI.

[35]  Zheng Wen,et al.  Stochastic Rank-1 Bandits , 2016, AISTATS.

[36]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.