Minimax Value Iterarion Applied to Robotic Soccer

This work focuses on developing a dynamic programming algorithm to solve a class of Stochastic Games called two-person zero-sum games, inspired by the reinforcement learning algorithm Minimax-Q. In each state of the game, linear programming is used to find a Nash equilibrium, which ensures optimality in a worst-case scenario. The method is then applied to a behavioral model of a robotic soccer game. The goal is to find the worst case scenario strategy for such a team, so that a lower bound for the team’s performance is guaranteed. Most of the times it converges to a conservative solution that tries, above all, to keep the opponent from scoring, rather than trying to score itself.

[1]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[2]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[3]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[4]  Christos G. Cassandras,et al.  Introduction to Discrete Event Systems , 1999, The Kluwer International Series on Discrete Event Dynamic Systems.

[5]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[6]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[7]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[8]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[9]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.