Transfer via inter-task mappings in policy search reinforcement learning

The ambitious goal of transfer learning is to accelerate learning on a target task after training on a different, but related, source task. While many past transfer methods have focused on transferring value-functions, this paper presents a method for transferring policies across tasks with different state and action spaces. In particular, this paper utilizes transfer via inter-task mappings for policy search methods (TVITM-PS) to construct a transfer functional that translates a population of neural network policies trained via policy search from a source task to a target task. Empirical results in robot soccer Keepaway and Server Job Scheduling show that TVITM-PS can markedly reduce learning time when full inter-task mappings are available. The results also demonstrate that TVITMPS still succeeds when given only incomplete inter-task mappings. Furthermore, we present a novel method for learning such mappings when they are not available, and give results showing they perform comparably to hand-coded mappings.

[1]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[2]  Marco Colombetti,et al.  Robot shaping: developing situated agents through learning , 1992 .

[3]  L. Darrell Whitley,et al.  International Workshop on Combinations of Genetic Algorithms and Neural Networks , 1992 .

[4]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[7]  X. Yao Evolving Artificial Neural Networks , 1999 .

[8]  Ian Witten,et al.  Data Mining , 2000 .

[9]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[10]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[11]  Rajarshi Das,et al.  Utility functions in autonomic systems , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[12]  Peter Stone,et al.  Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.

[13]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[14]  Jude W. Shavlik,et al.  Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another , 2005, ECML.

[15]  Peter Stone,et al.  Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[16]  Shimon Whiteson,et al.  Evolutionary Function Approximation for Reinforcement Learning , 2006, J. Mach. Learn. Res..

[17]  Shimon Whiteson,et al.  Comparing evolutionary and temporal difference methods in a reinforcement learning domain , 2006, GECCO.

[18]  Matthew Taylor and Shimon Whiteson and Peter Stone,et al.  Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning , 2006 .

[19]  Vishal Soni,et al.  Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.