Switching Q-learning in partially observable Markovian environments

Recent research on hidden-state reinforcement learning (RL) problems has been concentrated in overcoming partial observability by using memory to estimate states. Switching Q-learning (SQ-learning) is a novel memoryless approach for RL in partially observable environments. The basic idea of SQ-learning is that "non-Markovian" tasks can be automatically decomposed into subtasks solvable by memoryless policies, without any other information leading to "good" subgoals. To deal with such decomposition, SQ-learning employs ordered sequences of Q-modules in which each module discovers a local control policy. Furthermore, a hierarchical structure learning automaton is used which finds appropriate subgoal sequences. We apply SQ-learning to three partially observable maze problems. The results of extensive simulations demonstrate that SQ-learning has the ability to quickly learn optimal or near-optimal policies without huge computational burden.