论文信息 - Source Task Creation for Curriculum Learning - 字舞流文

Source Task Creation for Curriculum Learning

Transfer learning in reinforcement learning has been an active area of research over the past decade. In transfer learning, training on a source task is leveraged to speed up or otherwise improve learning on a target task. This paper presents the more ambitious problem of curriculum learning in reinforcement learning, in which the goal is to design a sequence of source tasks for an agent to train on, such that final performance or learning speed is improved. We take the position that each stage of such a curriculum should be tailored to the current ability of the agent in order to promote learning new behaviors. Thus, as a first step towards creating a curriculum, the trainer must be able to create novel, agent-specific source tasks. We explore how such a space of useful tasks can be created using a parameterized model of the domain and observed trajectories on the target task. We experimentally show that these methods can be used to form components of a curriculum and that such a curriculum can be used successfully for transfer learning in 2 challenging multiagent reinforcement learning domains.

Peter Stone | Jivko Sinapov | Matteo Leonetti | Sanmit Narvekar | P. Stone | J. Sinapov | Sanmit Narvekar | M. Leonetti

[1] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[2] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[3] Vishal Soni,et al. Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.

[4] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[5] Alan Fern,et al. Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[6] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[7] Ioannis P. Vlahavas,et al. Reinforcement learning agents providing advice in complex video games , 2014, Connect. Sci..

[8] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[9] Eric Eaton,et al. Active Task Selection for Lifelong Machine Learning , 2013, AAAI.

[10] Andrea Bonarini,et al. Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[11] Eric Eaton,et al. ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[12] Kouhei Ohnishi,et al. Advances in autonomous robots for service and entertainment , 2010, Robotics Auton. Syst..

[13] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[14] Peter Stone,et al. Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.

[15] Peter Stone,et al. Learning Inter-Task Transferability in the Absence of Target Task Samples , 2015, AAMAS.

[16] Alessandro Lazaric,et al. Transfer from Multiple MDPs , 2011, NIPS.

[17] Alessandro Lazaric,et al. Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[18] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[19] Grigorios Tsoumakas,et al. Transferring task models in Reinforcement Learning agents , 2013, Neurocomputing.

[20] Tomoharu Nakashima,et al. HELIOS2012: RoboCup 2012 Soccer Simulation 2D League Champion , 2012, RoboCup.

[21] Peter Stone,et al. Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study , 2006, RoboCup.

[22] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[25] Peter Stone,et al. Autonomous transfer for reinforcement learning , 2008, AAMAS.

[26] Javier García,et al. Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[27] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[28] Daphne Koller,et al. Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[29] Eric Eaton,et al. Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.