Reinforcement learning has been applied recently more and more for the optimisation of agent behaviours. This approach became popular due to its adaptive and unsupervised learning process. One of the key ideas of this approach is to estimate the value of agent states. For huge state spaces however, it is difficult to implement this approach. As a result, various models were proposed which make use of function approximators, such as neural networks, to solve this problem. This paper focuses on an implementation of value estimation with a particular class of neural networks, known as self organising maps. Experiments with an agent moving in a “gridworld” and the autonomous robot Khepera have been carried out to show the benefit of our approach. The results clearly show that the conventional approach, done by an implementation of a look-up table to represent the value function, can be out performed in terms of memory usage and convergence speed.
[1]
Sebastian Thrun,et al.
Explanation-based neural network learning a lifelong learning approach
,
1995
.
[2]
Helge Ritter,et al.
Extending Kohonens Self-Organizing Mapping Algorithm to Learn Ballistic Movements
,
1988
.
[3]
R. Bellman.
Dynamic programming.
,
1957,
Science.
[4]
Gerald Tesauro,et al.
Practical Issues in Temporal Difference Learning
,
1992,
Mach. Learn..
[5]
Geoffrey J. Gordon.
Stable Function Approximation in Dynamic Programming
,
1995,
ICML.
[6]
Andrew G. Barto,et al.
Improving Elevator Performance Using Reinforcement Learning
,
1995,
NIPS.
[7]
Sebastian Thrun,et al.
Explanation-based neural network learning
,
1996
.
[8]
T. Kohonen.
Self-organized formation of topology correct feature maps
,
1982
.