Distributed Multitask Reinforcement Learning with Quadratic Convergence

Multitask reinforcement learning (MTRL) suffers from scalability issues when the number of tasks or trajectories grows large. The main reason behind this drawback is the reliance on centeralised solutions. Recent methods exploited the connection between MTRL and general consensus to propose scalable solutions. These methods, however, suffer from two drawbacks. First, they rely on predefined objectives, and, second, exhibit linear convergence guarantees. In this paper, we improve over state-of-the-art by deriving multitask reinforcement learning from a variational inference perspective. We then propose a novel distributed solver for MTRL with quadratic convergence guarantees.

[1]  Sergey Levine,et al.  Variational Policy Search via Trajectory Optimization , 2013, NIPS.

[2]  Peter Englert,et al.  Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Shang-Hua Teng,et al.  Nearly-Linear Time Algorithms for Preconditioning and Solving Symmetric, Diagonally Dominant Linear Systems , 2006, SIAM J. Matrix Anal. Appl..

[4]  Shimon Whiteson,et al.  Learning potential functions and their representations for multi-task reinforcement learning , 2013, Autonomous Agents and Multi-Agent Systems.

[5]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[6]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[7]  David Enke,et al.  A hybrid stock trading system for intelligent technical analysis-based equivolume charting , 2009, Neurocomputing.

[8]  Aryan Mokhtari,et al.  Network Newton-Part I: Algorithm and Convergence , 2015, 1504.06017.

[9]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[10]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[11]  Alejandro Ribeiro,et al.  Accelerated Dual Descent for Network Flow Optimization , 2014, IEEE Transactions on Automatic Control.

[12]  Hui Li,et al.  Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..

[13]  Marc Toussaint,et al.  Robot trajectory optimization using approximate inference , 2009, ICML '09.

[14]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[15]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[16]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[17]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[18]  Gerhard Neumann,et al.  Variational Inference for Policy Search in changing situations , 2011, ICML.

[19]  Haitham Bou-Ammar,et al.  Scalable Multitask Policy Gradient Reinforcement Learning , 2017, AAAI.

[20]  Jean-Louis Goffin,et al.  On convergence rates of subgradient optimization methods , 1977, Math. Program..

[21]  Alessandro Lazaric,et al.  Sequential Transfer in Multi-armed Bandit with Finite Set of Models , 2013, NIPS.

[22]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[23]  Aryan Mokhtari,et al.  Network Newton-Part II: Convergence Rate and Implementation , 2015, 1504.06020.

[24]  Alessandro Lazaric,et al.  Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[25]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[26]  Sebastian Thrun,et al.  LEARNING MORE FROM LESS DATA: EXPERIMENTS WITH LIFELONG ROBOT LEARNING , 1996 .

[27]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[28]  Eric Eaton,et al.  Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[29]  A. Jadbabaie,et al.  Distributed SDDM Solvers: Theory & Applications , 2015, 1508.04096.

[30]  Alex Olshevsky,et al.  Linear Time Average Consensus on Fixed Graphs and Implications for Decentralized Optimization and Multi-Agent Control , 2014, 1411.4186.

[31]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[32]  Asuman E. Ozdaglar,et al.  Distributed Alternating Direction Method of Multipliers , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).