A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

We develop a mathematical framework for solving multi-task reinforcement learning problems based on a type of decentralized policy gradient method. The goal in multi-task reinforcement learning is to learn a common policy that operates effectively in different environments; these environments have similar (or overlapping) state and action spaces, but have different rewards and dynamics. Agents immersed in each of these environments communicate with other agents by sharing their models (i.e. their policy parameterizations) but not their state/reward paths. Our analysis provides a convergence rate for a consensus-based distributed, entropy-regularized policy gradient method for finding such a policy. We demonstrate the effectiveness of the proposed method using a series of numerical experiments. These experiments range from small-scale "Grid World" problems that readily demonstrate the trade-offs involved in multi-task learning to large-scale problems, where common policies are learned to play multiple Atari games or to navigate an airborne drone in multiple (simulated) environments.

[1]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[2]  Sandeep Chinchali,et al.  Multi-agent Reinforcement Learning for Networked System Control , 2020, ICLR.

[3]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[4]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[5]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[6]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[7]  Wolfram Burgard,et al.  Socially compliant mobile robot navigation via inverse reinforcement learning , 2016, Int. J. Robotics Res..

[8]  Mihailo R. Jovanovic,et al.  Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual Optimization , 2019, ArXiv.

[9]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[10]  Arijit Raychowdhury,et al.  Autonomous Navigation via Deep Reinforcement Learning for Resource Constraint Edge Nodes Using Transfer Learning , 2019, IEEE Access.

[11]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[12]  H. Vincent Poor,et al.  QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations , 2012, IEEE Trans. Signal Process..

[13]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[14]  Katja Hofmann,et al.  Decoding multitask DQN in the world of Minecraft , 2019 .

[15]  Adam Wierman,et al.  Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems , 2019, L4DC.

[16]  Andrea Bonarini,et al.  Sharing Knowledge in Multi-Task Deep Reinforcement Learning , 2020, ICLR.

[17]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[18]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[19]  Tao Yang,et al.  Distributed Stochastic Gradient Method for Non-Convex Problems with Applications in Supervised Learning , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[20]  Sergey Levine,et al.  Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[21]  Sham M. Kakade,et al.  Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.

[22]  David Filliat,et al.  DisCoRL: Continual Reinforcement Learning via Policy Distillation , 2019, ArXiv.

[23]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[24]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[25]  Volkan Cevher,et al.  Optimization for Reinforcement Learning: From a single agent to cooperative agents , 2020, IEEE Signal Processing Magazine.

[26]  Zhuoran Yang,et al.  Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization , 2018, NeurIPS.

[27]  Hongyuan Zha,et al.  F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning , 2020, ArXiv.

[28]  Yan Zhang,et al.  Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[29]  Tamer Basar,et al.  Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.

[30]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[31]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[32]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[33]  Tamer Basar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[34]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[35]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[36]  Jiming Liu,et al.  Reinforcement Learning in Healthcare: A Survey , 2019, ACM Comput. Surv..

[37]  Arijit Raychowdhury,et al.  NavREn-Rl: Learning to fly in real environment via end-to-end deep reinforcement learning using monocular images , 2018, 2018 25th International Conference on Mechatronics and Machine Vision in Practice (M2VIP).

[38]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[39]  Karol Hausman,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[40]  Hsiu-Chin Lin,et al.  Learning task constraints in operational space formulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[42]  Thinh T. Doan,et al.  Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning , 2019, ICML.