Keeping Dataset Biases out of the Simulation

Reinforcement learning for recommendation (RL4Rec) methods are increasingly receiving attention as an effective way to improve long-term user engagement. However, applying RL4Rec online comes with risks: exploration may lead to periods of detrimental user experience. Moreover, few researchers have access to real-world recommender systems. Simulations have been put forward as a solution where user feedback is simulated based on logged historical user data, thus enabling optimization and evaluation without being run online. While simulators do not risk the user experience and are widely accessible, we identify an important limitation of existing simulation methods. They ignore the interaction biases present in logged user data, and consequently, these biases affect the resulting simulation. As a solution to this issue, we introduce a debiasing step in the simulation pipeline, which corrects for the biases present in the logged data before it is used to simulate user behavior. To evaluate the effects of bias on RL4Rec simulations, we propose a novel evaluation approach for simulators that considers the performance of policies optimized with the simulator. Our results reveal that the biases from logged data negatively impact the resulting policies, unless corrected for with our debiasing method. While our debiasing methods can be applied to any simulator, we make our complete pipeline publicly available as the Simulator for OFfline leArning and evaluation (SOFA): the first simulator that accounts for interaction biases prior to optimization and evaluation.

[1]  W. Bruce Croft,et al.  Correcting for Recency Bias in Job Recommendation , 2019, CIKM.

[2]  Harald Steck,et al.  Training and testing of recommender systems on data missing not at random , 2010, KDD.

[3]  Krisztian Balog,et al.  Evaluating Conversational Recommender Systems via User Simulation , 2020, KDD.

[4]  Rui Zhang,et al.  Doubly Robust Joint Learning for Recommendation on Data Missing Not at Random , 2019, ICML.

[5]  Zoubin Ghahramani,et al.  Probabilistic Matrix Factorization with Non-random Missing Data , 2014, ICML.

[6]  Yong Yu,et al.  Large-scale Interactive Recommendation with Tree-structured Policy Gradient , 2018, AAAI.

[7]  Jun Tan,et al.  Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation , 2018, KDD.

[8]  Harald Steck,et al.  Evaluation of recommendations: rating-prediction and ranking , 2013, RecSys.

[9]  Yiqun Liu,et al.  How good your recommender system is? A survey on evaluations in recommendation , 2017, International Journal of Machine Learning and Cybernetics.

[10]  Alexandros Karatzoglou,et al.  RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising , 2018, ArXiv.

[11]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[12]  Jimeng Sun,et al.  Hierarchical Reinforcement Learning for Course Recommendation in MOOCs , 2019, AAAI.

[13]  Lu Wang,et al.  Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation , 2018, KDD.

[14]  Jiliang Tang,et al.  Toward Simulating Environments in Reinforcement Learning Based Recommendations , 2019, ArXiv.

[15]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[16]  Ed H. Chi,et al.  Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.

[17]  Jiliang Tang,et al.  Jointly Learning to Recommend and Advertise , 2020, KDD.

[18]  Pablo Castells,et al.  Should I Follow the Crowd?: A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems , 2018, SIGIR.

[19]  Jung-Woo Ha,et al.  Reinforcement Learning based Recommender System using Biclustering Technique , 2018, ArXiv.

[20]  Patrick Gallinari,et al.  Ranking with non-random missing ratings: influence of popularity and positivity on evaluation metrics , 2012, RecSys.

[21]  Liang Zhang,et al.  Deep Reinforcement Learning for List-wise Recommendations , 2017, ArXiv.

[22]  Yang Yu,et al.  Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning , 2018, AAAI.

[23]  Liang Zhang,et al.  Deep reinforcement learning for page-wise recommendations , 2018, RecSys.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Richard S. Zemel,et al.  Collaborative prediction and ranking with non-random missing data , 2009, RecSys '09.

[26]  Paul R. Rosenbaum,et al.  Overt Bias in Observational Studies , 2002 .

[27]  Yuan Qi,et al.  Generative Adversarial User Model for Reinforcement Learning Based Recommendation System , 2018, ICML.

[28]  Thorsten Joachims,et al.  Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[29]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[30]  Nicholas Jing Yuan,et al.  DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[31]  Craig Boutilier,et al.  RecSim: A Configurable Simulation Platform for Recommender Systems , 2019, ArXiv.

[32]  Richard S. Zemel,et al.  Collaborative Filtering and the Missing at Random Assumption , 2007, UAI.

[33]  Liang Zhang,et al.  Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.

[34]  Elias Z. Tragos,et al.  PyRecGym: a reinforcement learning gym for recommender systems , 2019, RecSys.

[35]  Jiaxing Song,et al.  Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems , 2019, KDD.

[36]  Lihong Li,et al.  Toward Predicting the Outcome of an A/B Experiment for Search Relevance , 2015, WSDM.

[37]  Thomas Nedelec,et al.  Offline A/B Testing for Recommender Systems , 2018, WSDM.

[38]  Thorsten Joachims,et al.  Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.