论文信息 - Switching Q-learning in partially observable Markovian environments

Switching Q-learning in partially observable Markovian environments

Recent research on hidden-state reinforcement learning (RL) problems has been concentrated in overcoming partial observability by using memory to estimate states. Switching Q-learning (SQ-learning) is a novel memoryless approach for RL in partially observable environments. The basic idea of SQ-learning is that "non-Markovian" tasks can be automatically decomposed into subtasks solvable by memoryless policies, without any other information leading to "good" subgoals. To deal with such decomposition, SQ-learning employs ordered sequences of Q-modules in which each module discovers a local control policy. Furthermore, a hierarchical structure learning automaton is used which finds appropriate subgoal sequences. We apply SQ-learning to three partially observable maze problems. The results of extensive simulations demonstrate that SQ-learning has the ability to quickly learn optimal or near-optimal policies without huge computational burden.

Kenichi Abe | Haeyeon Lee | Hiroyuki Kamaya

[1] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..

[2] R. Andrew. Hidden State and Reinforcement Learning with Instance-Based State Identification , 1996 .

[3] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[4] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[5] Tom M. Mitchell,et al. Reinforcement learning with hidden states , 1993 .

[6] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[7] M. Thathachar,et al. A Hierarchical System of Learning Automata , 1981, IEEE Transactions on Systems, Man, and Cybernetics.