论文信息 - Minimax TD-Learning with Neural Nets in a Markov Game

Minimax TD-Learning with Neural Nets in a Markov Game

A minimax version of temporal difference learning (minimax TD-learning) is given, similar to minimax Q-learning. The algorithm is used to train a neural net to play Campaign, a two-player zero-sum game with imperfect information of the Markov game class. Two different evaluation criteria for evaluating game-playing agents are used, and their relation to game theory is shown. Also practical aspects of linear programming and fictitious play used for solving matrix games are discussed.

Fredrik A. Dahl | O. M. Halck | Ole Martin Halck | F. Dahl | F. A. Dahl

[1] Howard Raiffa,et al. Games And Decisions , 1958 .

[2] L. Berkovitz. The Tactical Air Game: A Multimove Game with Mixed Strategy Solution , 1975 .

[3] J. D. Grote. The Theory and Application of Differential Games , 1975 .

[4] William H. Press,et al. Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[5] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[6] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[7] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[8] Avi Pfeffer,et al. Representations and Solutions for Game-Theoretic Problems , 1997, Artif. Intell..

[9] Jonathan Schaeffer,et al. Poker as Testbed for AI Research , 1998, Canadian Conference on AI.

[10] Jonathan Schaeffer,et al. Poker as a Testbed for Machine Intelligence Research , 1998 .

[11] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[12] Johannes Fürnkranz,et al. Proceedings of the ICML-99 Workshop on Machine Learning in Game Playing , 1999 .

[13] William H. Press,et al. Numerical recipes in C , 2002 .