论文信息 - Minimax Value Iterarion Applied to Robotic Soccer

Minimax Value Iterarion Applied to Robotic Soccer

This work focuses on developing a dynamic programming algorithm to solve a class of Stochastic Games called two-person zero-sum games, inspired by the reinforcement learning algorithm Minimax-Q. In each state of the game, linear programming is used to find a Nash equilibrium, which ensures optimality in a worst-case scenario. The method is then applied to a behavioral model of a robotic soccer game. The goal is to find the worst case scenario strategy for such a team, so that a lower bound for the team’s performance is guaranteed. Most of the times it converges to a conservative solution that tries, above all, to keep the opponent from scoring, rather than trying to score itself.

Gonçalo Neto | Pedro Lima | P. Lima | Gonçalo Neto

[1] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.

[2] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[3] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[4] Christos G. Cassandras,et al. Introduction to Discrete Event Systems , 1999, The Kluwer International Series on Discrete Event Dynamic Systems.

[5] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[6] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[7] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[8] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[9] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.