The neural network has been used extensively as a vehicle for both genetic algorithms and reinforcement learning. This paper shows a natural way to combine the two methods and suggests that reinforcement learning may be superior to random mutation as an engine for the discovery of useful substructures. The paper also describes a software experiment that applies this technique to produce an Othello-playing computer program. The experiment subjects a pool of Othello-playing programs to a regime of successive adaptation cycles, where each cycle consists of an evolutionary phase, based on the genetic algorithm, followed by a learning phase, based on reinforcement learning. A key idea of the genetic implementation is the concept of feature-level crossover. The regime was run for three months through 900,000 individual matches of Othello. It ultimately yielded a program that is competitive with a human-designed Othello-program that plays at roughly intermediate level.
[1]
Gerald Tesauro,et al.
Temporal difference learning and TD-Gammon
,
1995,
CACM.
[2]
Brian D. Ripley,et al.
Pattern Recognition and Neural Networks
,
1996
.
[3]
Richard S. Sutton,et al.
Reinforcement Learning
,
1992,
Handbook of Machine Learning.
[4]
John N. Tsitsiklis,et al.
Neuro-Dynamic Programming
,
1996,
Encyclopedia of Machine Learning.
[5]
Terry Jones,et al.
Crossover, Macromutationand, and Population-Based Search
,
1995,
ICGA.
[6]
David B. Fogel,et al.
Evolution, neural networks, games, and intelligence
,
1999,
Proc. IEEE.
[7]
Risto Miikkulainen,et al.
Discovering Complex Othello Strategies Through Evolutionary Neural Networks
,
1995
.