Edge-Compatible Reinforcement Learning for Recommendations

Most reinforcement learning (RL) recommendation systems designed for edge computing must either synchronize during recommendation selection or depend on an unprincipled patchwork collection of algorithms. In this work, we build on asynchronous coagent policy gradient algorithms \citep{kostas2020asynchronous} to propose a principled solution to this problem. The class of algorithms that we propose can be distributed over the internet and run asynchronously and in real-time. When a given edge fails to respond to a request for data with sufficient speed, this is not a problem; the algorithm is designed to function and learn in the edge setting, and network issues are part of this setting. The result is a principled, theoretically grounded RL algorithm designed to be distributed in and learn in this asynchronous environment. In this work, we describe this algorithm and a proposed class of architectures in detail, and demonstrate that they work well in practice in the asynchronous setting, even as the network quality degrades.

[1]  Qihao Zhou,et al.  Federated Reinforcement Learning: Techniques, Applications, and Open Challenges , 2021, Intelligence & Robotics.

[2]  Philip S. Thomas,et al.  Reinforcement Learning for Strategic Recommendations , 2020, ArXiv.

[3]  Yingqian Zhang,et al.  Algorithms for slate bandits with non-separable reward functions , 2020, ArXiv.

[4]  Philip S. Thomas,et al.  Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock , 2019, ICML.

[5]  Craig Boutilier,et al.  SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets , 2019, IJCAI.

[6]  Ed H. Chi,et al.  Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.

[7]  Timothy A. Mann,et al.  Beyond Greedy Ranking: Slate Optimization via List-CVAE , 2018, ICLR.

[8]  Elad Eban,et al.  Seq2Slate: Re-ranking and Slate Optimization with RNNs , 2018, ArXiv.

[9]  Liang Zhang,et al.  Deep reinforcement learning for page-wise recommendations , 2018, RecSys.

[10]  W. Bruce Croft,et al.  Learning a Deep Listwise Context Model for Ranking Refinement , 2018, SIGIR.

[11]  Jung-Woo Ha,et al.  Reinforcement Learning based Recommender System using Biclustering Technique , 2018, ArXiv.

[12]  Zheng Wen,et al.  Scalar Posterior Sampling with Applications , 2018, NeurIPS.

[13]  John Langford,et al.  Off-policy evaluation for slate recommendation , 2016, NIPS.

[14]  R. Beckwith,et al.  Tractable POMDP Planning Algorithms for Optimal Teaching in “ SPAIS ” , 2009 .

[15]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[16]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[17]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.