The Markov Decision Process Extraction Network

This paper presents the Markov decision process extraction network, which is a data-efficient, automatic state estimation approach for discrete-time reinforcement learning (RL) based on recurrent neural networks. The architecture is designed to model the minimal relevant dy- namics of an environment, capable of condensing large sets of continuous observables to a compact state representation and excluding irrelevant in- formation. To the best of our knowledge, it is the first approach published to automatically extract minimal relevant aspects of the dynamics from observations to model a Markov decision process, suitable for RL, without requiring special knowledge of the regarded environment. The capabilities of the neural state estimation approach are evaluated using the cart-pole problem and standard table-based policy iteration.