Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling

Recommendation is crucial in both academia and industry, and various techniques are proposed such as content-based collaborative filtering, matrix factorization, logistic regression, factorization machines, neural networks and multi-armed bandits. However, most of the previous studies suffer from two limitations: (1) considering the recommendation as a static procedure and ignoring the dynamic interactive nature between users and the recommender systems, (2) focusing on the immediate feedback of recommended items and neglecting the long-term rewards. To address the two limitations, in this paper we propose a novel recommendation framework based on deep reinforcement learning, called DRR. The DRR framework treats recommendation as a sequential decision making procedure and adopts an "Actor-Critic" reinforcement learning scheme to model the interactions between the users and recommender systems, which can consider both the dynamic adaptation and long-term rewards. Furthermore, a state representation module is incorporated into DRR, which can explicitly capture the interactions between items and users. Three instantiation structures are developed. Extensive experiments on four real-world datasets are conducted under both the offline and online evaluation settings. The experimental results demonstrate the proposed DRR method indeed outperforms the state-of-the-art competitors.

[1]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[2]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[3]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[4]  Jun Wang,et al.  Product-Based Neural Networks for User Response Prediction , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[5]  Jun Wang,et al.  Real-Time Bidding by Reinforcement Learning in Display Advertising , 2017, WSDM.

[6]  Weinan Zhang,et al.  Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising , 2018, CIKM.

[7]  Qing Wang,et al.  Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit , 2016, KDD.

[8]  Liang Zhang,et al.  Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.

[9]  Huazheng Wang,et al.  Learning Hidden Features for Contextual Bandits , 2016, CIKM.

[10]  Wei Zeng,et al.  Adapting Markov Decision Process for Search Result Diversification , 2017, SIGIR.

[11]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[12]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[13]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[14]  Jun Wang,et al.  Unifying user-based and item-based collaborative filtering approaches by similarity fusion , 2006, SIGIR.

[15]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[16]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[17]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[18]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[19]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[20]  Jiafeng Guo,et al.  Reinforcement Learning to Rank with Markov Decision Process , 2017, SIGIR.

[21]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[22]  Peter Sunehag,et al.  Reinforcement Learning in Large Discrete Action Spaces , 2015, ArXiv.

[23]  Huazheng Wang,et al.  Factorization Bandits for Interactive Recommendation , 2017, AAAI.

[24]  Liang Zhang,et al.  Deep Reinforcement Learning for List-wise Recommendations , 2017, ArXiv.

[25]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[26]  Jun Wang,et al.  Interactive collaborative filtering , 2013, CIKM.

[27]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[28]  Ahmad A. Kardan,et al.  A hybrid web recommender system based on Q-learning , 2008, SAC '08.

[29]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[30]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[31]  Yong Yu,et al.  Efficient Architecture Search by Network Transformation , 2017, AAAI.

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33]  Yunming Ye,et al.  DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.

[34]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[35]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[36]  Chih-Jen Lin,et al.  Field-aware Factorization Machines for CTR Prediction , 2016, RecSys.

[37]  Loriene Roy,et al.  Content-based book recommending using learning for text categorization , 1999, DL '00.

[38]  Jun Wang,et al.  Deep Learning over Multi-field Categorical Data - - A Case Study on User Response Prediction , 2016, ECIR.

[39]  Yujing Hu,et al.  Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application , 2018, KDD.

[40]  Nicholas Jing Yuan,et al.  DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.