Co-Evolution in the Successful Learning of Backgammon Strategy

Following Tesauro's work on TD-Gammon, we used a 4,000 parameter feedforward neural network to develop a competitive backgammon evaluation function. Play proceeds by a roll of the dice, application of the network to all legal moves, and selection of the position with the highest evaluation. However, no backpropagation, reinforcement or temporal difference learning methods were employed. Instead we apply simple hillclimbing in a relative fitness environment. We start with an initial champion of all zero weights and proceed simply by playing the current champion network against a slightly mutated challenger and changing weights if the challenger wins. Surprisingly, this worked rather well. We investigate how the peculiar dynamics of this domain enabled a previously discarded weak method to succeed, by preventing suboptimal equilibria in a “meta-game” of self-learning.

[1]  Gerald Tesauro,et al.  Connectionist Learning of Expert Preferences by Comparison Training , 1988, NIPS.

[2]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[3]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[4]  Michael F. Shlesinger,et al.  Dynamic patterns in complex systems , 1988 .

[5]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[6]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[7]  P. Angeline An Alternate Interpretation of the Iterated Prisoner ' s Dilemma and the Evolution of Non-Mutual Cooperation Peter , 1994 .

[8]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD-lambda Network , 1995, NIPS.

[9]  W. Daniel Hillis,et al.  Co-evolving parasites improve simulated evolution as an optimization procedure , 1990 .

[10]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[11]  W. Hamilton,et al.  The Evolution of Cooperation , 1984 .

[12]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[13]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[14]  Jordan B. Pollack,et al.  Massively parallel genetic programming , 1996 .

[15]  Susan L. Epstein Toward an Ideal Trainer , 1994 .

[16]  D. B. Fogel,et al.  Using evolutionary programing to create neural networks that are capable of playing tic-tac-toe , 1993, IEEE International Conference on Neural Networks.

[17]  Karl Sims,et al.  Evolving 3d morphology and behavior by competition , 1994 .

[18]  Craig W. Reynolds Competition, Coevolution and the Game of Tag , 1994 .

[19]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[20]  Terrence J. Sejnowski,et al.  Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.

[21]  J M Smith,et al.  Evolution and the theory of games , 1976 .

[22]  James P. Crutchfield,et al.  Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations , 1993, Complex Syst..

[23]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[24]  Richard K. Belew,et al.  Methods for Competitive Co-Evolution: Finding Opponents Worth Beating , 1995, ICGA.

[25]  Dave Cliff,et al.  Tracking the Red Queen: Measurements of Adaptive Progress in Co-Evolutionary Simulations , 1995, ECAL.

[26]  Peter J. Angeline,et al.  Competitive Environments Evolve Better Solutions for Complex Tasks , 1993, ICGA.

[27]  Charles E. Taylor,et al.  Artificial Life II , 1991 .

[28]  D. E. Matthews Evolution and the Theory of Games , 1977 .

[29]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[30]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[31]  J. K. Kinnear,et al.  Advances in Genetic Programming , 1994 .