Toward Simulating Environments in Reinforcement Learning Based Recommendations

With the recent advances in Reinforcement Learning (RL), there have been tremendous interests in employing RL for recommender systems. RL-based recommender systems have two key advantages: (i) they can continuously update their recommendation strategies according to users' real-time feedback, and (ii) the optimal strategy maximizes the long-term reward from users, such as the total revenue of a recommendation session. However, directly training and evaluating a new RL-based recommendation algorithm needs to collect users' real-time feedback in the real system, which is time and efforts consuming and could negatively impact on users' experiences. Thus, it calls for a user simulator that can mimic real users' behaviors where we can pre-train and evaluate new recommendation algorithms. Simulating users' behaviors in a dynamic system faces immense challenges -- (i) the underlining item distribution is complex, and (ii) historical logs for each user are limited. In this paper, we develop a user simulator base on Generative Adversarial Network (GAN). To be specific, we design the generator to capture the underlining distribution of users' historical logs and generate realistic logs that can be considered as augmentations of real logs; while the discriminator is developed to not only distinguish real and fake logs but also predict users' behaviors. The experimental results based on real-world e-commerce data demonstrate the effectiveness of the proposed simulator. Further experiments have been conducted to understand the importance of each component in the simulator.

[1]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[2]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[3]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[4]  Deborah Estrin,et al.  Unbiased offline recommender evaluation for missing-not-at-random implicit feedback , 2018, RecSys.

[5]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[6]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7]  Yiwei Zhang,et al.  Reinforcement Mechanism Design for e-commerce , 2017, WWW.

[8]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[9]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[10]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[11]  Wei Chu,et al.  An unbiased offline evaluation of contextual bandit algorithms with generalized linear models , 2011 .

[12]  Xing Xie,et al.  A Reinforcement Learning Framework for Explainable Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[13]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[14]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[15]  Qi Liu,et al.  Exploring the Choice Under Conflict for Social Event Participation , 2016, DASFAA.

[16]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[17]  Jun Tan,et al.  Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation , 2018, KDD.

[18]  Jiliang Tang,et al.  Model-Based Reinforcement Learning for Whole-Chain Recommendations , 2019, ArXiv.

[19]  Nicholas Jing Yuan,et al.  DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[20]  Jim Gao,et al.  Machine Learning Applications for Data Center Optimization , 2014 .

[21]  Thomas Nedelec,et al.  Offline A/B Testing for Recommender Systems , 2018, WSDM.

[22]  Jung-Woo Ha,et al.  Reinforcement Learning based Recommender System using Biclustering Technique , 2018, ArXiv.

[23]  Camille Couprie,et al.  Semantic Segmentation using Adversarial Networks , 2016, NIPS 2016.

[24]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[25]  Ed H. Chi,et al.  Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.

[26]  Guandong Xu,et al.  CoSoLoRec: Joint Factor Model with Content, Social, Location for Heterogeneous Point-of-Interest Recommendation , 2016, KSEM.

[27]  Jiliang Tang,et al.  "Deep reinforcement learning for search, recommendation, and online advertising: a survey" by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator , 2018, SIGWEB Newsl..

[28]  Ashutosh Saxena,et al.  Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds , 2015, ISRR.

[29]  Hans C. Jessen,et al.  Applied Logistic Regression Analysis , 1996 .

[30]  Feng Liu,et al.  Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling , 2018, ArXiv.

[31]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[32]  Jiliang Tang,et al.  Reinforcement Learning for Online Information Seeking , 2018, ArXiv.

[33]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[34]  Lihong Li,et al.  Toward Predicting the Outcome of an A/B Experiment for Search Relevance , 2015, WSDM.

[35]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[36]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[37]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[38]  Alexandros Karatzoglou,et al.  Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.

[39]  Yiwei Zhang,et al.  Reinforcement Mechanism Design for Fraudulent Behaviour in e-Commerce , 2018, AAAI.

[40]  Liang Zhang,et al.  Deep Reinforcement Learning for List-wise Recommendations , 2017, ArXiv.

[41]  Wei Zeng,et al.  Adapting Markov Decision Process for Search Result Diversification , 2017, SIGIR.

[42]  Loriene Roy,et al.  Content-based book recommending using learning for text categorization , 1999, DL '00.

[43]  Liang Zhang,et al.  Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.

[44]  Yong Yu,et al.  Large-scale Interactive Recommendation with Tree-structured Policy Gradient , 2018, AAAI.

[45]  Wolfram Burgard,et al.  Socially compliant mobile robot navigation via inverse reinforcement learning , 2016, Int. J. Robotics Res..

[46]  Liang Zhang,et al.  Deep reinforcement learning for page-wise recommendations , 2018, RecSys.

[47]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.