Policy Invariance under Reward Transformations for General-Sum Stochastic Games

We extend the potential-based shapingmethod fromMarkov decision processes to multiplayer general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.

[1]  Sam Devlin,et al.  Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[2]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[3]  Andrew G. Barto,et al.  Shaping as a method for accelerating reinforcement learning , 1992, Proceedings of the 1992 IEEE International Symposium on Intelligent Control.

[4]  Michael L. Littman,et al.  Social reward shaping in the prisoner's dilemma , 2008, AAMAS.

[5]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Michael L. Littman,et al.  Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.

[8]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[9]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[12]  Manuela Veloso,et al.  Multiagent learning in the presence of agents with limitations , 2003 .

[13]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[14]  Marco Colombetti,et al.  Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[15]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[16]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.