Recent research on hidden-state reinforcement learning (RL) problems has been concentrated in overcoming partial observability by using memory to estimate states. Switching Q-learning (SQ-learning) is a novel memoryless approach for RL in partially observable environments. The basic idea of SQ-learning is that "non-Markovian" tasks can be automatically decomposed into subtasks solvable by memoryless policies, without any other information leading to "good" subgoals. To deal with such decomposition, SQ-learning employs ordered sequences of Q-modules in which each module discovers a local control policy. Furthermore, a hierarchical structure learning automaton is used which finds appropriate subgoal sequences. We apply SQ-learning to three partially observable maze problems. The results of extensive simulations demonstrate that SQ-learning has the ability to quickly learn optimal or near-optimal policies without huge computational burden.
[1]
Jürgen Schmidhuber,et al.
HQ-Learning
,
1997,
Adapt. Behav..
[2]
R. Andrew.
Hidden State and Reinforcement Learning with Instance-Based State Identification
,
1996
.
[3]
Andrew W. Moore,et al.
Reinforcement Learning: A Survey
,
1996,
J. Artif. Intell. Res..
[4]
Peter Dayan,et al.
Q-learning
,
1992,
Machine Learning.
[5]
Tom M. Mitchell,et al.
Reinforcement learning with hidden states
,
1993
.
[6]
John Loch,et al.
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes
,
1998,
ICML.
[7]
M. Thathachar,et al.
A Hierarchical System of Learning Automata
,
1981,
IEEE Transactions on Systems, Man, and Cybernetics.