End-to-End Deep Reinforcement Learning based Recommendation with Supervised Embedding

The research of reinforcement learning (RL) based recommendation method has become a hot topic in recommendation community, due to the recent advance in interactive recommender systems. The existing RL recommendation approaches can be summarized into a unified framework with three components, namely embedding component (EC), state representation component (SRC) and policy component (PC). We find that EC cannot be nicely trained with the other two components simultaneously. Previous studies bypass the obstacle through a pre-training and fixing strategy, which makes their approaches unlike a real end-to-end fashion. More importantly, such pre-trained and fixed EC suffers from two inherent drawbacks: (1) Pre-trained and fixed embeddings are unable to model evolving preference of users and item correlations in the dynamic environment; (2) Pre-training is inconvenient in the industrial applications. To address the problem, in this paper, we propose an End-to-end Deep Reinforcement learning based Recommendation framework (EDRR). In this framework, a supervised learning signal is carefully designed for smoothing the update gradients to EC, and three incorporating ways are introduced and compared. To the best of our knowledge, we are the first to address the training compatibility between the three components in RL based recommendations. Extensive experiments are conducted on three real-world datasets, and the results demonstrate the proposed EDRR effectively achieves the end-to-end training purpose for both policy-based and value-based RL models, and delivers better performance than state-of-the-art methods.

[1]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[2]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[3]  Jun Wang,et al.  Interactive collaborative filtering , 2013, CIKM.

[4]  Weinan Zhang,et al.  Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising , 2018, CIKM.

[5]  Liang Zhang,et al.  Deep reinforcement learning for page-wise recommendations , 2018, RecSys.

[6]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[7]  Bin Liu,et al.  Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction , 2019, WWW.

[8]  Xiaoli Li,et al.  Rank-GeoFM: A Ranking based Geographical Factorization Method for Point of Interest Recommendation , 2015, SIGIR.

[9]  Feng Liu,et al.  Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling , 2018, ArXiv.

[10]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[11]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[12]  Katja Hofmann,et al.  Collective Noise Contrastive Estimation for Policy Transfer Learning , 2016, AAAI.

[13]  Liang Zhang,et al.  Deep Reinforcement Learning for List-wise Recommendations , 2017, ArXiv.

[14]  Jun Wang,et al.  Unifying user-based and item-based collaborative filtering approaches by similarity fusion , 2006, SIGIR.

[15]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[16]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[19]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[20]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[21]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[22]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Yunming Ye,et al.  DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.

[25]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[26]  Huazheng Wang,et al.  Learning Hidden Features for Contextual Bandits , 2016, CIKM.

[27]  Guorui Zhou,et al.  Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.

[28]  Sergey Levine,et al.  Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Liang Zhang,et al.  Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.

[30]  Qing Wang,et al.  Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit , 2016, KDD.

[31]  Yong Yu,et al.  Large-scale Interactive Recommendation with Tree-structured Policy Gradient , 2018, AAAI.

[32]  Nicholas Jing Yuan,et al.  DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[33]  Rémi Munos,et al.  Selecting the State-Representation in Reinforcement Learning , 2011, NIPS.

[34]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[35]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[36]  Ed H. Chi,et al.  Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.

[37]  Jiliang Tang,et al.  Model-Based Reinforcement Learning for Whole-Chain Recommendations , 2019, ArXiv.

[38]  Jun Wang,et al.  Product-Based Neural Networks for User Response Prediction , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[39]  Peter Sunehag,et al.  Reinforcement Learning in Large Discrete Action Spaces , 2015, ArXiv.

[40]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[41]  G. Ruxton The unequal variance t-test is an underused alternative to Student's t-test and the Mann–Whitney U test , 2006 .

[42]  Chih-Jen Lin,et al.  Field-aware Factorization Machines for CTR Prediction , 2016, RecSys.

[43]  Loriene Roy,et al.  Content-based book recommending using learning for text categorization , 1999, DL '00.

[44]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[45]  Pablo Castells,et al.  Should I Follow the Crowd?: A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems , 2018, SIGIR.

[46]  Yong Yu,et al.  Product-Based Neural Networks for User Response Prediction over Multi-Field Categorical Data , 2018, ACM Trans. Inf. Syst..