论文信息 - Plan-based reward shaping for multi-agent reinforcement learning - 字舞流文

Plan-based reward shaping for multi-agent reinforcement learning

Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL. Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.

Sam Devlin | Daniel Kudenko | D. Kudenko | Sam Devlin

[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[2] Sam Devlin,et al. Dynamic potential-based reward shaping , 2012, AAMAS.

[3] Amal El Fallah Seghrouchni,et al. Multi-Agent Planning , 2012, Software Agents, Agent Systems and Their Applications.

[4] Cees Witteveen,et al. Multi-agent Planning An introduction to planning and coordination , 2005 .

[5] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[6] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[7] Bhaskara Marthi,et al. Automatic shaping and decomposition of reward functions , 2007, ICML '07.

[8] Peter Vrancx,et al. Solving Delayed Coordination Problems in MAS (Extended Abstract) , 2011 .

[9] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[10] Daniel Kudenko,et al. Improving Optimistic Exploration in Model-Free Reinforcement Learning , 2009, ICANNGA.

[11] Mark A. Peot,et al. Conditional nonlinear planning , 1992 .

[12] Yoav Shoham,et al. If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[13] Sam Devlin,et al. An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems , 2011, Adv. Complex Syst..

[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15] J. Nash. NON-COOPERATIVE GAMES , 1951, Classics in Game Theory.

[16] Michael L. Littman,et al. Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.

[17] Michael L. Littman,et al. Social reward shaping in the prisoner's dilemma , 2008, AAMAS.

[18] Daniel Kudenko,et al. Multigrid Reinforcement Learning with Reward Shaping , 2008, ICANN.

[19] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[20] Sam Devlin,et al. Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[21] S. Rosenschein,et al. On social laws for artificial agent societies: off-line design , 1996 .

[22] Marek Grzes,et al. Improving exploration in reinforcement learning through domain knowledge and parameter analysis , 2010 .

[23] M. Grzes,et al. Plan-based reward shaping for reinforcement learning , 2008, 2008 4th International IEEE Conference Intelligent Systems.

[24] Jeffrey S. Rosenschein,et al. Synchronization of Multi-Agent Plans , 1982, AAAI.

[25] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .