An Emergence Of Game Strategy In Multiagent Systems

Emergence of game strategy in multiagent systems is studied. Symbolic and subsymbolic (neural network) approaches are compared. Symbolic approach is represented by a backtrack algorithm with specified search depth, whereas the subsymbolic approach is represented by feedforward neural networks that are adapted by reinforcement temporal difference TD(λ) technique. As a test game, we use simplified checkers. The problem is studied in the framework of multiagent system, where each agent is endowed with a neural network used for a classification of checkers positions. Three different strategies are used. The first strategy corresponds to a single agent that repeatedly plays games against MinMax version of a backtrack search method. The second strategy corresponds to single agents that are repeatedly playing a megatournament, where each agent plays two different games with all other agents, one game with white pieces and the other game with black pieces. After finishing each game, both agents modify their neural networks by reinforcement learning. The third strategy is an evolutionary modification of the second one. When a megatournament is finished, each agent is evaluated by a fitness, which reflects its success in the given megatournament (more successful agents have greater fitness). It is demonstrated that all these approaches lead to a population of agents very successfully playing checkers against a backtrack algorithm with the search depth 3.

[1]  David B. Fogel,et al.  Evolution, neural networks, games, and intelligence , 1999, Proc. IEEE.

[2]  Michael I. Jordan,et al.  Modular and hierarchical learning systems , 1998 .

[3]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[4]  R. Sutton,et al.  Reinforcement Learning in Artificial Intelligence , 1997 .

[5]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[6]  David B. Fogel,et al.  Evolving neural networks to play checkers without relying on expert knowledge , 1999, IEEE Trans. Neural Networks.

[7]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[8]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[9]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[10]  David B. Fogel,et al.  Evolving an expert checkers playing program without using human expertise , 2001, IEEE Trans. Evol. Comput..

[11]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[12]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[13]  Martin Müller,et al.  Computer Go , 2002, Artif. Intell..

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  S.H.G. ten Hagen,et al.  Linear Quadratic Regulation using reinforcement learning , 1998 .

[16]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[17]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[18]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[19]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[20]  Jonathan Schaeffer,et al.  Chinook Is World Checkers Champion! , 1994, J. Int. Comput. Games Assoc..

[21]  Peter Trebatický Recurrent Neural Network Training with the Extended Kalman Filter , 2005 .

[22]  S. Haykin Kalman Filtering and Neural Networks , 2001 .