Over coming Erroneous Domain Knowledge in Plan-Based Reward Shaping (Extended Abstract)
暂无分享,去创建一个
Reward shaping has been shown to signicantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided domain knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it was better to ignore all prior knowledge despite it only being partially erroneous. This paper introduces a novel use of knowledge revision to overcome erroneous domain knowledge when provided to an agent receiving plan-based reward shaping. Empirical results show that an agent using this method can outperform the previous agent receiving plan-based reward shaping without knowledge revision.
[1] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[2] M. Grzes,et al. Plan-based reward shaping for reinforcement learning , 2008, 2008 4th International IEEE Conference Intelligent Systems.
[3] PETER GÄRDENFORS,et al. Belief Revision: Belief revision: An introduction , 2003 .