论文信息 - Distributed lifelong reinforcement learning with sub-linear regret

Distributed lifelong reinforcement learning with sub-linear regret

In this paper we propose a distributed second-order method for lifelong reinforcement learning (LRL). Upon observing a new task, our algorithm scales state-of-the-art LRL by approximating the Newton direction up-to-any arbitrary precision ∊ > 0, while guaranteeing accurate solutions. We analyze the theoretical properties of this new method and derive, for the first time to the best of our knowledge, sublinear regret under this setting.

[1] Eric Eaton,et al. ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[2] Asuman E. Ozdaglar,et al. Distributed Alternating Direction Method of Multipliers , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[3] Alejandro Ribeiro,et al. Accelerated Dual Descent for Network Flow Optimization , 2014, IEEE Transactions on Automatic Control.

[4] Alessandro Lazaric,et al. Sequential Transfer in Multi-armed Bandit with Finite Set of Models , 2013, NIPS.

[5] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[6] Aryan Mokhtari,et al. Network Newton-Part II: Convergence Rate and Implementation , 2015, 1504.06020.

[7] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[8] Richard Peng,et al. An efficient parallel solver for SDD linear systems , 2013, STOC.

[9] Eric Eaton,et al. Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[10] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[11] Alex Olshevsky,et al. Linear Time Average Consensus on Fixed Graphs and Implications for Decentralized Optimization and Multi-Agent Control , 2014, 1411.4186.

[12] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[13] Peter L. Bartlett,et al. Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions , 2013, NIPS.

[14] Eric Eaton,et al. Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret , 2015, ICML.

[15] Haitham Bou-Ammar,et al. Reinforcement learning transfer via sparse coding , 2012, AAMAS.

[16] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[17] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[18] Haitham Bou-Ammar,et al. An exact distributed newton method for reinforcement learning , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[19] Alan Fern,et al. Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[20] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[21] Aryan Mokhtari,et al. Network Newton-Part I: Algorithm and Convergence , 2015, 1504.06017.

[22] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.