In this paper we propose a neural architecture for solving continuous time and space reinforcement learning problems in non-stationary environments. The method is based on a mechanism for creating, updating and selecting partial models of the environment. The partial models are incrementally estimated using linear approximation functions and are built according to the system’s capability of making predictions regarding a given sequence of observations. We propose, formalize and show the efficiency of this method in the non-stationary pendulum task. We show that the neural architecture with context detection performs better than a model-based RL algorithm and that it performs almost as well as the optimum, that is, a hypothetical system with extended sensor capabilities in a way that the environment effectively appears to be stationary. Finally, we present known limitations of the method and future works.
[1]
Kenji Doya,et al.
Temporal Difference Learning in Continuous Time and Space
,
1995,
NIPS.
[2]
Ashwin Ram,et al.
Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces
,
1997,
Adapt. Behav..
[3]
Kenji Doya,et al.
Reinforcement Learning in Continuous Time and Space
,
2000,
Neural Computation.
[4]
Dit-Yan Yeung,et al.
Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making
,
2001,
Sequence Learning.
[5]
Mitsuo Kawato,et al.
Multiple Model-Based Reinforcement Learning
,
2002,
Neural Computation.
[6]
Paulo Martins Engel,et al.
Dealing with non-stationary environments using context detection
,
2006,
ICML.