论文信息 - Transfer via inter-task mappings in policy search reinforcement learning

Transfer via inter-task mappings in policy search reinforcement learning

The ambitious goal of transfer learning is to accelerate learning on a target task after training on a different, but related, source task. While many past transfer methods have focused on transferring value-functions, this paper presents a method for transferring policies across tasks with different state and action spaces. In particular, this paper utilizes transfer via inter-task mappings for policy search methods (TVITM-PS) to construct a transfer functional that translates a population of neural network policies trained via policy search from a source task to a target task. Empirical results in robot soccer Keepaway and Server Job Scheduling show that TVITM-PS can markedly reduce learning time when full inter-task mappings are available. The results also demonstrate that TVITMPS still succeeds when given only incomplete inter-task mappings. Furthermore, we present a novel method for learning such mappings when they are not available, and give results showing they perform comparably to hand-coded mappings.

[1] Richard S. Sutton,et al. Training and Tracking in Robotics , 1985, IJCAI.

[2] Marco Colombetti,et al. Robot shaping: developing situated agents through learning , 1992 .

[3] L. Darrell Whitley,et al. International Workshop on Combinations of Genetic Algorithms and Neural Networks , 1992 .

[4] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.

[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6] William W. Cohen. Fast Effective Rule Induction , 1995, ICML.

[7] X. Yao. Evolving Artificial Neural Networks , 1999 .

[8] Ian Witten,et al. Data Mining , 2000 .

[9] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[10] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[11] Rajarshi Das,et al. Utility functions in autonomic systems , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[12] Peter Stone,et al. Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.

[13] Peter Stone,et al. Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[14] Jude W. Shavlik,et al. Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another , 2005, ECML.

[15] Peter Stone,et al. Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[16] Shimon Whiteson,et al. Evolutionary Function Approximation for Reinforcement Learning , 2006, J. Mach. Learn. Res..

[17] Shimon Whiteson,et al. Comparing evolutionary and temporal difference methods in a reinforcement learning domain , 2006, GECCO.

[18] Matthew Taylor and Shimon Whiteson and Peter Stone,et al. Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning , 2006 .

[19] Vishal Soni,et al. Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.