Plan-based reward shaping for multi-agent reinforcement learning

Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL. Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.

[1]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[2]  Sam Devlin,et al.  Dynamic potential-based reward shaping , 2012, AAMAS.

[3]  Amal El Fallah Seghrouchni,et al.  Multi-Agent Planning , 2012, Software Agents, Agent Systems and Their Applications.

[4]  Cees Witteveen,et al.  Multi-agent Planning An introduction to planning and coordination , 2005 .

[5]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[6]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[7]  Bhaskara Marthi,et al.  Automatic shaping and decomposition of reward functions , 2007, ICML '07.

[8]  Peter Vrancx,et al.  Solving Delayed Coordination Problems in MAS (Extended Abstract) , 2011 .

[9]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[10]  Daniel Kudenko,et al.  Improving Optimistic Exploration in Model-Free Reinforcement Learning , 2009, ICANNGA.

[11]  Mark A. Peot,et al.  Conditional nonlinear planning , 1992 .

[12]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[13]  Sam Devlin,et al.  An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems , 2011, Adv. Complex Syst..

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  J. Nash NON-COOPERATIVE GAMES , 1951, Classics in Game Theory.

[16]  Michael L. Littman,et al.  Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.

[17]  Michael L. Littman,et al.  Social reward shaping in the prisoner's dilemma , 2008, AAMAS.

[18]  Daniel Kudenko,et al.  Multigrid Reinforcement Learning with Reward Shaping , 2008, ICANN.

[19]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[20]  Sam Devlin,et al.  Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[21]  S. Rosenschein,et al.  On social laws for artificial agent societies: off-line design , 1996 .

[22]  Marek Grzes,et al.  Improving exploration in reinforcement learning through domain knowledge and parameter analysis , 2010 .

[23]  M. Grzes,et al.  Plan-based reward shaping for reinforcement learning , 2008, 2008 4th International IEEE Conference Intelligent Systems.

[24]  Jeffrey S. Rosenschein,et al.  Synchronization of Multi-Agent Plans , 1982, AAAI.

[25]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .