论文信息 - Reward Mapping for Transfer in Long-Lived Agents

Reward Mapping for Transfer in Long-Lived Agents

We consider how to transfer knowledge from previous tasks (MDPs) to a current task in long-lived and bounded agents that must solve a sequence of tasks over a finite lifetime. A novel aspect of our transfer approach is that we reuse reward functions. While this may seem counterintuitive, we build on the insight of recent work on the optimal rewards problem that guiding an agent's behavior with reward functions other than the task-specifying reward function can help overcome computational bounds of the agent. Specifically, we use good guidance reward functions learned on previous tasks in the sequence to incrementally train a reward mapping function that maps task-specifying reward functions into good initial guidance reward functions for subsequent tasks. We demonstrate that our approach can substantially improve the agent's performance relative to other approaches, including an approach that transfers policies.

Richard L. Lewis | Xiaoxiao Guo | Satinder P. Singh

[1] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[2] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[3] Shimon Whiteson,et al. Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[4] Tanaka Fumihide,et al. Multitask Reinforcement Learning on the Distribution of MDPs , 2003 .

[5] Jude W. Shavlik,et al. Policy Transfer via Markov Logic Networks , 2009, ILP.

[6] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[7] Peter Stone,et al. Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping , 2006, AAAI.

[8] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[9] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[10] Andrew G. Barto,et al. Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[11] Sriraam Natarajan,et al. Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.