Reinforcement Learning (Kaelbling et al., 1996) can be used to learn to control an agent by letting it interact with its environment. In general there are two kinds of reinforcement learning; (1) Value-function based reinforcement learning, which are based on the use of heuristic dynamic programming algorithms such as temporal difference learning (Sutton, 1988) and Q-learning (Watkins, 1989), and (2) Evolutionary algorithms such as genetic programming (Koza, 1992), Symbiotic Adaptive Neuron Evolution (SANE) (Moriarty & Miikkulainen, 1996), and Enforced SubPopulations (ESP) (Gomez & Miikkulainen, 1998). There is still an ongoing debate which of these algorithms works best for a particular problem. E.g. for learning to play games, often value-function based RL seems appropriate since the Markov assumption holds. E.g., Tesauro (1992) used temporal difference learning to let a program learn to play backgammon by playing against itself, and this led to human-expert level. However, for non-Markovian environments evolutionary approaches may sometimes be more beneficial.
[1]
Risto Miikkulainen,et al.
Efficient Reinforcement Learning through Symbiotic Evolution
,
1996,
Machine Learning.
[2]
John R. Koza,et al.
Genetic evolution and co-evolution of computer programs
,
1991
.
[3]
Richard S. Sutton,et al.
Learning to predict by the methods of temporal differences
,
1988,
Machine Learning.
[4]
G. Tesauro.
Practical Issues in Temporal Difference Learning
,
1992
.
[5]
Andrew W. Moore,et al.
Reinforcement Learning: A Survey
,
1996,
J. Artif. Intell. Res..
[6]
Risto Miikkulainen,et al.
2-D Pole Balancing with Recurrent Evolutionary Networks
,
1998
.
[7]
Chris Watkins,et al.
Learning from delayed rewards
,
1989
.