Keeping Dataset Biases out of the Simulation: A Debiased Simulator for Reinforcement Learning based Recommender Systems

Reinforcement learning for recommendation (RL4Rec) methods are increasingly receiving attention as an effective way to improve long-term user engagement. However, applying RL4Rec online comes with risks: exploration may lead to periods of detrimental user experience. Moreover, few researchers have access to real-world recommender systems. Simulations have been put forward as a solution where user feedback is simulated based on logged historical user data, thus enabling optimization and evaluation without being run online. While simulators do not risk the user experience and are widely accessible, we identify an important limitation of existing simulation methods. They ignore the interaction biases present in logged user data, and consequently, these biases affect the resulting simulation. As a solution to this issue, we introduce a debiasing step in the simulation pipeline, which corrects for the biases present in the logged data before it is used to simulate user behavior. To evaluate the effects of bias on RL4Rec simulations, we propose a novel evaluation approach for simulators that considers the performance of policies optimized with the simulator. Our results reveal that the biases from logged data negatively impact the resulting policies, unless corrected for with our debiasing method. While our debiasing methods can be applied to any simulator, we make our complete pipeline publicly available as the Simulator for OFfline leArning and evaluation (SOFA): the first simulator that accounts for interaction biases prior to optimization and evaluation.

[1]  Richard S. Zemel,et al.  Collaborative Filtering and the Missing at Random Assumption , 2007, UAI.

[2]  Thomas Nedelec,et al.  Offline A/B Testing for Recommender Systems , 2018, WSDM.

[3]  Ed H. Chi,et al.  Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.

[4]  Jung-Woo Ha,et al.  Reinforcement Learning based Recommender System using Biclustering Technique , 2018, ArXiv.

[5]  Jiliang Tang,et al.  Jointly Learning to Recommend and Advertise , 2020, KDD.

[6]  Liang Zhang,et al.  Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.

[7]  Yong Yu,et al.  Large-scale Interactive Recommendation with Tree-structured Policy Gradient , 2018, AAAI.

[8]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[9]  W. Bruce Croft,et al.  Correcting for Recency Bias in Job Recommendation , 2019, CIKM.

[10]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[11]  Thorsten Joachims,et al.  Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[12]  Lu Wang,et al.  Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation , 2018, KDD.

[13]  BowlingMichael,et al.  The arcade learning environment , 2013 .

[14]  Jiliang Tang,et al.  Toward Simulating Environments in Reinforcement Learning Based Recommendations , 2019, ArXiv.

[15]  Yuan Qi,et al.  Generative Adversarial User Model for Reinforcement Learning Based Recommendation System , 2018, ICML.

[16]  Jiaxing Song,et al.  Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems , 2019, KDD.

[17]  Liang Zhang,et al.  Deep Reinforcement Learning for List-wise Recommendations , 2017, ArXiv.

[18]  Yang Yu,et al.  Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning , 2018, AAAI.

[19]  Liang Zhang,et al.  Deep reinforcement learning for page-wise recommendations , 2018, RecSys.

[20]  Harald Steck,et al.  Training and testing of recommender systems on data missing not at random , 2010, KDD.

[21]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[22]  Harald Steck,et al.  Evaluation of recommendations: rating-prediction and ranking , 2013, RecSys.

[23]  Krisztian Balog,et al.  Evaluating Conversational Recommender Systems via User Simulation , 2020, KDD.

[24]  Alexandros Karatzoglou,et al.  RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising , 2018, ArXiv.

[25]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[26]  Jun Tan,et al.  Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation , 2018, KDD.

[27]  Patrick Gallinari,et al.  Ranking with non-random missing ratings: influence of popularity and positivity on evaluation metrics , 2012, RecSys.

[28]  Jimeng Sun,et al.  Hierarchical Reinforcement Learning for Course Recommendation in MOOCs , 2019, AAAI.

[29]  Harald Steck,et al.  Item popularity and recommendation accuracy , 2011, RecSys '11.

[30]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[31]  Rui Zhang,et al.  Doubly Robust Joint Learning for Recommendation on Data Missing Not at Random , 2019, ICML.

[32]  Thorsten Joachims,et al.  Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.

[33]  Richard S. Zemel,et al.  Collaborative prediction and ranking with non-random missing data , 2009, RecSys '09.

[34]  Lihong Li,et al.  Toward Predicting the Outcome of an A/B Experiment for Search Relevance , 2015, WSDM.

[35]  Yiqun Liu,et al.  How good your recommender system is? A survey on evaluations in recommendation , 2017, International Journal of Machine Learning and Cybernetics.

[36]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37]  Paul R. Rosenbaum,et al.  Overt Bias in Observational Studies , 2002 .

[38]  Pablo Castells,et al.  Should I Follow the Crowd?: A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems , 2018, SIGIR.

[39]  Elias Z. Tragos,et al.  PyRecGym: a reinforcement learning gym for recommender systems , 2019, RecSys.

[40]  Nicholas Jing Yuan,et al.  DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[41]  Craig Boutilier,et al.  RecSim: A Configurable Simulation Platform for Recommender Systems , 2019, ArXiv.