论文信息 - Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling - 字舞流文

Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling

Recommendation is crucial in both academia and industry, and various techniques are proposed such as content-based collaborative filtering, matrix factorization, logistic regression, factorization machines, neural networks and multi-armed bandits. However, most of the previous studies suffer from two limitations: (1) considering the recommendation as a static procedure and ignoring the dynamic interactive nature between users and the recommender systems, (2) focusing on the immediate feedback of recommended items and neglecting the long-term rewards. To address the two limitations, in this paper we propose a novel recommendation framework based on deep reinforcement learning, called DRR. The DRR framework treats recommendation as a sequential decision making procedure and adopts an "Actor-Critic" reinforcement learning scheme to model the interactions between the users and recommender systems, which can consider both the dynamic adaptation and long-term rewards. Furthermore, a state representation module is incorporated into DRR, which can explicitly capture the interactions between items and users. Three instantiation structures are developed. Extensive experiments on four real-world datasets are conducted under both the offline and online evaluation settings. The experimental results demonstrate the proposed DRR method indeed outperforms the state-of-the-art competitors.

Feng Liu | Yunming Ye | Xutao Li | Huifeng Guo | Ruiming Tang | Haokun Chen | Yuzhou Zhang | Xutao Li | Yunming Ye | Ruiming Tang | Haokun Chen | Huifeng Guo | Yuzhou Zhang | Feng Liu

[1] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[2] Yehuda Koren,et al. Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[3] Patrick Seemann,et al. Matrix Factorization Techniques for Recommender Systems , 2014 .

[4] Jun Wang,et al. Product-Based Neural Networks for User Response Prediction , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[5] Jun Wang,et al. Real-Time Bidding by Reinforcement Learning in Display Advertising , 2017, WSDM.

[6] Weinan Zhang,et al. Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising , 2018, CIKM.

[7] Qing Wang,et al. Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit , 2016, KDD.

[8] Liang Zhang,et al. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.

[9] Huazheng Wang,et al. Learning Hidden Features for Contextual Bandits , 2016, CIKM.

[10] Wei Zeng,et al. Adapting Markov Decision Process for Search Result Diversification , 2017, SIGIR.

[11] Steffen Rendle,et al. Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[12] Heng-Tze Cheng,et al. Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[13] Greg Linden,et al. Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[14] Jun Wang,et al. Unifying user-based and item-based collaborative filtering approaches by similarity fusion , 2006, SIGIR.

[15] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[16] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[17] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[18] Ruslan Salakhutdinov,et al. Probabilistic Matrix Factorization , 2007, NIPS.

[19] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[20] Jiafeng Guo,et al. Reinforcement Learning to Rank with Markov Decision Process , 2017, SIGIR.

[21] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[22] Peter Sunehag,et al. Reinforcement Learning in Large Discrete Action Spaces , 2015, ArXiv.

[23] Huazheng Wang,et al. Factorization Bandits for Interactive Recommendation , 2017, AAAI.

[24] Liang Zhang,et al. Deep Reinforcement Learning for List-wise Recommendations , 2017, ArXiv.

[25] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[26] Jun Wang,et al. Interactive collaborative filtering , 2013, CIKM.

[27] George Karypis,et al. Item-based top-N recommendation algorithms , 2004, TOIS.

[28] Ahmad A. Kardan,et al. A hybrid web recommender system based on Q-learning , 2008, SAC '08.

[29] Martin Wattenberg,et al. Ad click prediction: a view from the trenches , 2013, KDD.

[30] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[31] Yong Yu,et al. Efficient Architecture Search by Network Transformation , 2017, AAAI.

[32] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33] Yunming Ye,et al. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.

[34] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[35] Guy Shani,et al. An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[36] Chih-Jen Lin,et al. Field-aware Factorization Machines for CTR Prediction , 2016, RecSys.

[37] Loriene Roy,et al. Content-based book recommending using learning for text categorization , 1999, DL '00.

[38] Jun Wang,et al. Deep Learning over Multi-field Categorical Data - - A Case Study on User Response Prediction , 2016, ECIR.

[39] Yujing Hu,et al. Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application , 2018, KDD.

[40] Nicholas Jing Yuan,et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.