A Contextual-Bandit Approach to Online Learning to Rank for Relevance and Diversity

Online learning to rank (LTR) focuses on learning a policy from user interactions that builds a list of items sorted in decreasing order of the item utility. It is a core area in modern interactive systems, such as search engines, recommender systems, or conversational assistants. Previous online LTR approaches either assume the relevance of an item in the list to be independent of other items in the list or the relevance of an item to be a submodular function of the utility of the list. The former type of approach may result in a list of low diversity that has relevant items covering the same aspects, while the latter approaches may lead to a highly diversified list but with some non-relevant items. In this paper, we study an online LTR problem that considers both item relevance and topical diversity. We assume cascading user behavior, where a user browses the displayed list of items from top to bottom and clicks the first attractive item and stops browsing the rest. We propose a hybrid contextual bandit approach, called CascadeHybrid, for solving this problem. CascadeHybrid models item relevance and topical diversity using two independent functions and simultaneously learns those functions from user click feedback. We derive a gap-free bound on the n-step regret of CascadeHybrid. We conduct experiments to evaluate CascadeHybrid on the MovieLens and Yahoo music datasets. Our experimental results show that CascadeHybrid outperforms the baselines on both datasets.

[1]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[2]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[3]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[4]  Katja Hofmann,et al.  Balancing Exploration and Exploitation in Learning to Rank Online , 2011, ECIR.

[5]  Xiaoyan Zhu,et al.  Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation , 2014, SDM.

[6]  Zheng Wen,et al.  Optimal Greedy Diversity for Recommendation , 2015, IJCAI.

[7]  M. de Rijke,et al.  Result diversification based on query-specific cluster ranking , 2011, J. Assoc. Inf. Sci. Technol..

[8]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[9]  M. de Rijke,et al.  BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback , 2018, UAI.

[10]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[11]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[12]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[13]  Xiaoyan Zhu,et al.  Promoting Diversity in Recommendation by Entropy Regularizer , 2013, IJCAI.

[14]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[15]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[16]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[17]  M. de Rijke,et al.  Differentiable Unbiased Online Learning to Rank , 2018, CIKM.

[18]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[19]  Shuai Li,et al.  Online Learning to Rank with Features , 2018, ICML.

[20]  Gene H. Golub,et al.  Matrix computations , 1983 .

[21]  M. de Rijke,et al.  To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions , 2019, SIGIR.

[22]  Yisong Yue,et al.  Linear Submodular Bandits and their Application to Diversified Retrieval , 2011, NIPS.

[23]  Zheng Wen,et al.  DCM Bandits: Learning to Rank with Multiple Clicks , 2016, ICML.

[24]  Filip Radlinski,et al.  Learning optimally diverse rankings over large document collections , 2010, ICML.

[25]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[26]  M. de Rijke,et al.  Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model , 2019, IJCAI.

[27]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[28]  Zheng Wen,et al.  Cascading Linear Submodular Bandits: Accounting for Position Bias and Diversity in Online Learning to Rank , 2019, UAI.

[29]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[30]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[31]  Zheng Wen,et al.  Cascading Bandits for Large-Scale Recommendation Problems , 2016, UAI.

[32]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[33]  Csaba Szepesvári,et al.  Online Learning to Rank in Stochastic Click Models , 2017, ICML.

[34]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..

[35]  M. de Rijke,et al.  Ranking for Relevance and Display Preferences in Complex Presentation Layouts , 2018, SIGIR.

[36]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.