AbstnactIn this paper, we present a new reinforcement learning approach compensating for the perceptual aliasing problem by varying policies depending on the behavior context. For this approach, motiuatwd vdue(M-value) is introduced as a parameter emphasizing specific future action selection probabilities temporarily according to the context. In the learning phase, a Q-value renewal error linked with the current state-action pair is memorized as M-value linked with past visited experiences. In the control phase, t o motivate a next action, an agent awakes M-values linked with the current state and memorized in past experiences. By combining Mvalue with Q-value, even if an agent observes the same sensory inputs under the different states ,the agent can generate different action selection policies with the context. The advantage of the proposed approach is that the learning/control system reflecting the difference of context can be realized easily, in spite of the saving of computational memories, by the simple extension of general reinforcement learning: Q-learning. In order to investigate the validity of the proposed method, we apply the method to the maze problem containing perceptual aliasing problem, and compare it with the case of general Q-learning. The result on maze environment experiment shows that the proposed approach can work effectively in the non-Markov decision process environment involving perceptual aliasing problems. Keywordsreinforcement learning, Q-learning, POMDPs, perceptual aliasing.
[1]
P. Lanzi,et al.
Adaptive Agents with Reinforcement Learning and Internal Memory
,
2000
.
[2]
Andrew W. Moore,et al.
Reinforcement Learning: A Survey
,
1996,
J. Artif. Intell. Res..
[3]
Lonnie Chrisman,et al.
Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach
,
1992,
AAAI.
[4]
Jürgen Schmidhuber,et al.
HQ-Learning
,
1997,
Adapt. Behav..
[5]
Michael L. Littman,et al.
Algorithms for Sequential Decision Making
,
1996
.
[6]
Leslie Pack Kaelbling,et al.
Learning Policies with External Memory
,
1999,
ICML.
[7]
Steven Douglas Whitehead,et al.
Reinforcement learning for the adaptive control of perception and action
,
1992
.