Contextual Dependent Click Bandit Algorithm for Web Recommendation

In recommendation systems, it has been an increasing emphasis on recommending potentially novel and interesting items in addition to currently confirmed attractive ones. In this paper, we propose a contextual bandit algorithm for web page recommendation in the dependent click model (DCM), which takes user and web page features into consideration and automatically balances between exploration and exploitation. In addition, unlike many previous contextual bandit algorithms which assume that the click through rate is a linear function of features, we enhance the representability by adopting the generalized linear models, which include both linear and logistic regressions and have exhibited stronger performance in many binary-reward applications. We prove an upper bound of \(\tilde{O}(d\sqrt{n})\) on the regret of the proposed algorithm. Experiments are conducted on both synthetic and real-world data, and the results demonstrate significant advantages of our algorithm.

[1]  Shuai Li,et al.  Contextual Combinatorial Cascading Bandits , 2016, ICML.

[2]  Wei Chu,et al.  An unbiased offline evaluation of contextual bandit algorithms with generalized linear models , 2011 .

[3]  Zheng Wen,et al.  Combinatorial Cascading Bandits , 2015, NIPS.

[4]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[5]  David Hsu,et al.  Exploration in Interactive Personalized Music Recommendation: A Reinforcement Learning Approach , 2013, TOMM.

[6]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[7]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[8]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[9]  Aurélien Garivier,et al.  Parametric Bandits: The Generalized Linear Case , 2010, NIPS.

[10]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[11]  Jack Bowden,et al.  Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. , 2015, Statistical science : a review journal of the Institute of Mathematical Statistics.

[12]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[13]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[14]  Paul Resnick,et al.  Recommender systems , 1997, CACM.

[15]  Berkant Barla Cambazoglu,et al.  Scalability Challenges in Web Search Engines , 2015, Scalability Challenges in Web Search Engines.

[16]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[17]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[18]  Sophie Ahrens,et al.  Recommender Systems , 2012 .

[19]  William W. Hager,et al.  Updating the Inverse of a Matrix , 1989, SIAM Rev..

[20]  Zheng Wen,et al.  DCM Bandits: Learning to Rank with Multiple Clicks , 2016, ICML.

[21]  Lihong Li,et al.  Provable Optimal Algorithms for Generalized Linear Contextual Bandits , 2017, ArXiv.