Reinforcement Learning in Non-Stationary Continuous Time and Space Scenarios

In this paper we propose a neural architecture for solving continuous time and space reinforcement learning problems in non-stationary environments. The method is based on a mechanism for creating, updating and selecting partial models of the environment. The partial models are incrementally estimated using linear approximation functions and are built according to the system’s capability of making predictions regarding a given sequence of observations. We propose, formalize and show the efficiency of this method in the non-stationary pendulum task. We show that the neural architecture with context detection performs better than a model-based RL algorithm and that it performs almost as well as the optimum, that is, a hypothetical system with extended sensor capabilities in a way that the environment effectively appears to be stationary. Finally, we present known limitations of the method and future works.