Coevolution of a Backgammon Player

One of the persistent themes in Artificial Life research is the use of co-evolutionary arms races in the development of specific and complex behaviors. However, other than Sims’s work on artificial robots, most of the work has attacked very simple games of prisoners dilemma or predator and prey. Following Tesauro’s work on TD-Gammon, we used a 4000 parameter feed-forward neural network to develop a competitive backgammon evaluation function. Play proceeds by a roll of the dice, application of the network to all legal moves, and choosing the move with the highest evaluation. However, no back-propagation, reinforcement or temporal difference learning methods were employed. Instead we apply simple hillclimbing in a relative fitness environment. We start with an initial champion of all zero weights and proceed simply by playing the current champion network against a slightly mutated challenger, changing weights when the challenger wins. Our results show co-evolution to be a powerful machine learning method, even when coupled with simple hillclimbing, and suggest that the surprising success of Tesauro’s program had more to do with the co-evolutionary structure of the learning task and the dynamics of the backgammon game itself, than to sophistication in the learning techniques.

[1]  Karl Sims,et al.  Evolving 3d morphology and behavior by competition , 1994 .

[2]  Craig W. Reynolds Competition, Coevolution and the Game of Tag , 1994 .

[3]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[4]  W. Hamilton,et al.  The evolution of cooperation. , 1984, Science.

[5]  Kristian Lindgren,et al.  Evolutionary phenomena in simple dynamics , 1992 .

[6]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[7]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[8]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[9]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[10]  P. Angeline An Alternate Interpretation of the Iterated Prisoner ' s Dilemma and the Evolution of Non-Mutual Cooperation Peter , 1994 .

[11]  Jordan B. Pollack,et al.  Massively parallel genetic programming , 1996 .

[12]  Susan L. Epstein Toward an Ideal Trainer , 1994 .

[13]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[14]  W. Daniel Hillis,et al.  Co-evolving parasites improve simulated evolution as an optimization procedure , 1990 .

[15]  John H. Holland Echoing Emergence , .

[16]  Terrence J. Sejnowski,et al.  Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.

[17]  Thomas S. Ray,et al.  An Approach to the Synthesis of Life , 1991 .

[18]  Karl Sims,et al.  Evolving 3D Morphology and Behavior by Competition , 1994, Artificial Life.

[19]  Dave Cliff,et al.  Tracking the Red Queen: Measurements of Adaptive Progress in Co-Evolutionary Simulations , 1995, ECAL.

[20]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[21]  Peter J. Angeline,et al.  Competitive Environments Evolve Better Solutions for Complex Tasks , 1993, ICGA.

[22]  Charles E. Taylor,et al.  Artificial Life II , 1991 .

[23]  Richard K. Belew,et al.  Methods for Competitive Co-Evolution: Finding Opponents Worth Beating , 1995, ICGA.

[24]  James P. Crutchfield,et al.  Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations , 1993, Complex Syst..

[25]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[26]  D. B. Fogel,et al.  Using evolutionary programing to create neural networks that are capable of playing tic-tac-toe , 1993, IEEE International Conference on Neural Networks.

[27]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..