Distributed lifelong reinforcement learning with sub-linear regret

In this paper we propose a distributed second-order method for lifelong reinforcement learning (LRL). Upon observing a new task, our algorithm scales state-of-the-art LRL by approximating the Newton direction up-to-any arbitrary precision ∊ > 0, while guaranteeing accurate solutions. We analyze the theoretical properties of this new method and derive, for the first time to the best of our knowledge, sublinear regret under this setting.

[1]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[2]  Asuman E. Ozdaglar,et al.  Distributed Alternating Direction Method of Multipliers , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[3]  Alejandro Ribeiro,et al.  Accelerated Dual Descent for Network Flow Optimization , 2014, IEEE Transactions on Automatic Control.

[4]  Alessandro Lazaric,et al.  Sequential Transfer in Multi-armed Bandit with Finite Set of Models , 2013, NIPS.

[5]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[6]  Aryan Mokhtari,et al.  Network Newton-Part II: Convergence Rate and Implementation , 2015, 1504.06020.

[7]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[8]  Richard Peng,et al.  An efficient parallel solver for SDD linear systems , 2013, STOC.

[9]  Eric Eaton,et al.  Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[10]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[11]  Alex Olshevsky,et al.  Linear Time Average Consensus on Fixed Graphs and Implications for Decentralized Optimization and Multi-Agent Control , 2014, 1411.4186.

[12]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[13]  Peter L. Bartlett,et al.  Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions , 2013, NIPS.

[14]  Eric Eaton,et al.  Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret , 2015, ICML.

[15]  Haitham Bou-Ammar,et al.  Reinforcement learning transfer via sparse coding , 2012, AAMAS.

[16]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[17]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[18]  Haitham Bou-Ammar,et al.  An exact distributed newton method for reinforcement learning , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[19]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[20]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[21]  Aryan Mokhtari,et al.  Network Newton-Part I: Algorithm and Convergence , 2015, 1504.06017.

[22]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.