Source Task Creation for Curriculum Learning

Transfer learning in reinforcement learning has been an active area of research over the past decade. In transfer learning, training on a source task is leveraged to speed up or otherwise improve learning on a target task. This paper presents the more ambitious problem of curriculum learning in reinforcement learning, in which the goal is to design a sequence of source tasks for an agent to train on, such that final performance or learning speed is improved. We take the position that each stage of such a curriculum should be tailored to the current ability of the agent in order to promote learning new behaviors. Thus, as a first step towards creating a curriculum, the trainer must be able to create novel, agent-specific source tasks. We explore how such a space of useful tasks can be created using a parameterized model of the domain and observed trajectories on the target task. We experimentally show that these methods can be used to form components of a curriculum and that such a curriculum can be used successfully for transfer learning in 2 challenging multiagent reinforcement learning domains.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[3]  Vishal Soni,et al.  Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.

[4]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[5]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[6]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[7]  Ioannis P. Vlahavas,et al.  Reinforcement learning agents providing advice in complex video games , 2014, Connect. Sci..

[8]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[9]  Eric Eaton,et al.  Active Task Selection for Lifelong Machine Learning , 2013, AAAI.

[10]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[11]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[12]  Kouhei Ohnishi,et al.  Advances in autonomous robots for service and entertainment , 2010, Robotics Auton. Syst..

[13]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[14]  Peter Stone,et al.  Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.

[15]  Peter Stone,et al.  Learning Inter-Task Transferability in the Absence of Target Task Samples , 2015, AAMAS.

[16]  Alessandro Lazaric,et al.  Transfer from Multiple MDPs , 2011, NIPS.

[17]  Alessandro Lazaric,et al.  Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[18]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[19]  Grigorios Tsoumakas,et al.  Transferring task models in Reinforcement Learning agents , 2013, Neurocomputing.

[20]  Tomoharu Nakashima,et al.  HELIOS2012: RoboCup 2012 Soccer Simulation 2D League Champion , 2012, RoboCup.

[21]  Peter Stone,et al.  Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study , 2006, RoboCup.

[22]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[25]  Peter Stone,et al.  Autonomous transfer for reinforcement learning , 2008, AAMAS.

[26]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[27]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[28]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[29]  Eric Eaton,et al.  Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.