论文信息 - Policy Invariance under Reward Transformations for General-Sum Stochastic Games

Policy Invariance under Reward Transformations for General-Sum Stochastic Games

We extend the potential-based shapingmethod fromMarkov decision processes to multiplayer general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.

[1] Sam Devlin,et al. Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[2] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[3] Andrew G. Barto,et al. Shaping as a method for accelerating reinforcement learning , 1992, Proceedings of the 1992 IEEE International Symposium on Intelligent Control.

[4] Michael L. Littman,et al. Social reward shaping in the prisoner's dilemma , 2008, AAMAS.

[5] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[6] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[7] Michael L. Littman,et al. Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.

[8] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[9] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[12] Manuela Veloso,et al. Multiagent learning in the presence of agents with limitations , 2003 .

[13] T. Başar,et al. Dynamic Noncooperative Game Theory , 1982 .

[14] Marco Colombetti,et al. Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[15] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[16] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.