Transferable Contextual Bandit for Cross-Domain Recommendation

Traditional recommendation systems (RecSys) suffer from two problems: the exploitation-exploration dilemma and the cold-start problem. One solution to solving the exploitationexploration dilemma is the contextual bandit policy, which adaptively exploits and explores user interests. As a result, the contextual bandit policy achieves increased rewards in the long run. The contextual bandit policy, however, may cause the system to explore more than needed in the cold-start situations, which can lead to worse short-term rewards. Crossdomain RecSys methods adopt transfer learning to leverage prior knowledge in a source RecSys domain to jump start the cold-start target RecSys. To solve the two problems together, in this paper, we propose the first applicable transferable contextual bandit (TCB) policy for the cross-domain recommendation. TCB not only benefits the exploitation but also accelerates the exploration in the target RecSys. TCB’s exploration, in turn, helps to learn how to transfer between different domains. TCB is a general algorithm for both homogeneous and heterogeneous domains. We perform both theoretical regret analysis and empirical experiments. The empirical results show that TCB outperforms the state-of-the-art algorithms over time.

[1]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[2]  Alessandro Lazaric,et al.  Sequential Transfer in Multi-armed Bandit with Finite Set of Models , 2013, NIPS.

[3]  Archie C. Chapman,et al.  Epsilon-First Policies for Budget-Limited Multi-Armed Bandits , 2010, AAAI.

[4]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[5]  Claudio Gentile,et al.  A Gang of Bandits , 2013, NIPS.

[6]  Qiang Yang,et al.  Instilling Social to Physical: Co-Regularized Heterogeneous Transfer Learning , 2016, AAAI.

[7]  Qiang Yang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Transfer Learning to Predict Missing Ratings via Heterogeneous User Feedbacks , 2022 .

[8]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[9]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[10]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[11]  Philip S. Yu,et al.  Transfer across Completely Different Feature Spaces via Spectral Embedding , 2013, IEEE Transactions on Knowledge and Data Engineering.

[12]  Li Zhou,et al.  A Survey on Contextual Multi-armed Bandits , 2015, ArXiv.

[13]  Shuai Li,et al.  Collaborative Filtering Bandits , 2015, SIGIR.

[14]  Huazheng Wang,et al.  Learning Hidden Features for Contextual Bandits , 2016, CIKM.

[15]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[17]  Elias Bareinboim,et al.  Transfer Learning in Multi-Armed Bandit: A Causal Approach , 2017, AAMAS.

[18]  Jason L. Loeppky,et al.  A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit , 2015, ArXiv.

[19]  Qiang Yang,et al.  Translated Learning: Transfer Learning across Different Feature Spaces , 2008, NIPS.

[20]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[21]  Qiang Yang,et al.  Transfer Learning in Collaborative Filtering for Sparsity Reduction , 2010, AAAI.

[22]  Yisong Yue,et al.  Hierarchical Exploration for Accelerating Contextual Bandits , 2012, ICML.

[23]  Huazheng Wang,et al.  Factorization Bandits for Interactive Recommendation , 2017, AAAI.

[24]  Qiang Yang,et al.  Heterogeneous Transfer Learning for Image Classification , 2011, AAAI.