Kno wledge Revision for Reinforcement Learning with Abstract MDPs (Extended Abstract)

Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. As attention is shifting from tabula-rasa approaches to methods where some heuristic domain knowledge can be given to agents, an important problem that arises is how can agents deal with erroneous knowledge and what is the impact to their behavior both in a single- as well as a multi-agent setting where agents are faced with conflicting goals. Previous research demonstrated the use of plan-based reward shaping with knowledge revision in a single agent scenario where agents showed that they can quickly identify and revise erroneous knowledge and thus benefit from more accurate plans. Moving to a multi-agent setting the use of individual plans as a source of reward shaping has not been as successful due to the agents’ conflicting goals. In this paper we present the use of MDPs as a method to provide heuristic knowledge coupled with a revision algorithm to manage the cases where the provided domain knowledge is wrong. We show how agents can deal with erroneous knowledge in the single agent case and how this method can be used in a multi-agent environment for conflict resolution.

[1]  Bhaskara Marthi,et al.  Automatic shaping and decomposition of reward functions , 2007, ICML '07.

[2]  Daniel Kudenko,et al.  Using plan-based reward shaping to learn strategies in StarCraft: Broodwar , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[3]  Sam Devlin,et al.  Overcoming erroneous domain knowledge in plan-based reward shaping , 2013, AAMAS.

[4]  M. Grzes,et al.  Plan-based reward shaping for reinforcement learning , 2008, 2008 4th International IEEE Conference Intelligent Systems.