Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning based Recommendation

Recent advances in recommender systems have proved the potential of Reinforcement Learning (RL) to handle the dynamic evolution processes between users and recommender systems. However, learning to train an optimal RL agent is generally impractical with commonly sparse user feedback data in the context of recommender systems. To circumvent the lack of interaction of current RL-based recommender systems, we propose to learn a general Model-Agnostic Counterfactual Synthesis (MACS) Policy for counterfactual user interaction data augmentation. The counterfactual synthesis policy aims to synthesise counterfactual states while preserving significant information in the original state relevant to the user's interests, building upon two different training approaches we designed: learning with expert demonstrations and joint training. As a result, the synthesis of each counterfactual data is based on the current recommendation agent's interaction with the environment to adapt to users' dynamic interests. We integrate the proposed policy Deep Deterministic Policy Gradient (DDPG), Soft Actor Critic (SAC) and Twin Delayed DDPG in an adaptive pipeline with a recommendation agent that can generate counterfactual data to improve the performance of recommendation. The empirical results on both online simulation and offline datasets demonstrate the effectiveness and generalisation of our counterfactual synthesis policy and verify that it improves the performance of RL recommendation agents.

[1]  Min Lin,et al.  Causal Representation Learning for Out-of-Distribution Recommendation , 2022, WWW.

[2]  Kun Xiong,et al.  Counterfactual Review-based Recommendation , 2021, CIKM.

[3]  Julian McAuley,et al.  Locality-Sensitive Experience Replay for Online Recommendation , 2021, ArXiv.

[4]  Julian McAuley,et al.  A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions , 2021, ArXiv.

[5]  Quanyu Dai,et al.  Top-N Recommendation with Counterfactual User Preference Simulation , 2021, CIKM.

[6]  Xu Chen,et al.  Counterfactual Explainable Recommendation , 2021, CIKM.

[7]  Tat-Seng Chua,et al.  CauseRec: Counterfactual User Sequence Synthesis for Sequential Recommendation , 2021, SIGIR.

[8]  Ji-Rong Wen,et al.  Counterfactual Data-Augmented Sequential Recommendation , 2021, SIGIR.

[9]  Guohui Ling,et al.  Causal Intervention for Leveraging Popularity Bias in Recommendation , 2021, SIGIR.

[10]  Jos'e Miguel Hern'andez-Lobato,et al.  Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation , 2020, ArXiv.

[11]  Lina Yao,et al.  Generative Inverse Deep Reinforcement Learning for Online Recommendation , 2020, CIKM.

[12]  Yin Zhang,et al.  Counterfactual Generator: A Weakly-Supervised Method for Named Entity Recognition , 2020, EMNLP.

[13]  Xiangnan He,et al.  Clicks can be Cheating: Counterfactual Recommendation for Mitigating Clickbait Issue , 2020, SIGIR.

[14]  Xiuqiang He,et al.  A General Knowledge Distillation Framework for Counterfactual Recommendation via Uniform Data , 2020, SIGIR.

[15]  Yang Li,et al.  Nonintrusive-Sensing and Reinforcement-Learning Based Adaptive Personalized Music Recommendation , 2020, SIGIR.

[16]  Yan Wang,et al.  A Graphical and Attentional Framework for Dual-Target Cross-Domain Recommendation , 2020, IJCAI.

[17]  Wei Liu,et al.  Knowledge-guided Deep Reinforcement Learning for Interactive Recommendation , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[18]  Shiliang Pu,et al.  Counterfactual Samples Synthesizing for Robust Visual Question Answering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Dawei Yin,et al.  Pseudo Dyna-Q: A Reinforcement Learning Framework for Interactive Recommendation , 2020, WSDM.

[20]  Tim Miller,et al.  Explainable Reinforcement Learning Through a Causal Lens , 2019, AAAI.

[21]  Long Chen,et al.  Counterfactual Critic Multi-Agent Training for Scene Graph Generation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Jun Tan,et al.  Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation , 2018, KDD.

[23]  Yaohang Li,et al.  A survey of matrix completion methods for recommendation systems , 2018, Big Data Min. Anal..

[24]  Yang Yu,et al.  Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning , 2018, AAAI.

[25]  Nicholas Jing Yuan,et al.  DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[26]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[27]  Liang Zhang,et al.  Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.

[28]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[29]  Seoung Bum Kim,et al.  Content-based filtering for recommendation systems using multiattribute networks , 2017, Expert Syst. Appl..

[30]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[31]  Stephen Bonner,et al.  Causal embeddings for recommendation , 2017, RecSys.

[32]  Thorsten Joachims,et al.  Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.

[33]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[34]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[35]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[36]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[37]  Francesco Ricci,et al.  Learning and adaptivity in interactive recommender systems , 2007, ICEC.

[38]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[39]  Thomas Hofmann,et al.  Unifying collaborative and content-based filtering , 2004, ICML.

[40]  Joseph Y. Halpern,et al.  Causes and Explanations: A Structural-Model Approach. Part II: Explanations , 2001, The British Journal for the Philosophy of Science.

[41]  Elizabeth Lou New South Wales , 1912, Australian endodontic journal : the journal of the Australian Society of Endodontology Inc.

[42]  Pasquale Lops,et al.  Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[43]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.