论文信息 - The Markov Decision Process Extraction Network

The Markov Decision Process Extraction Network

This paper presents the Markov decision process extraction network, which is a data-efficient, automatic state estimation approach for discrete-time reinforcement learning (RL) based on recurrent neural networks. The architecture is designed to model the minimal relevant dy- namics of an environment, capable of condensing large sets of continuous observables to a compact state representation and excluding irrelevant in- formation. To the best of our knowledge, it is the first approach published to automatically extract minimal relevant aspects of the dynamics from observations to model a Markov decision process, suitable for RL, without requiring special knowledge of the regarded environment. The capabilities of the neural state estimation approach are evaluated using the cart-pole problem and standard table-based policy iteration.

[1] Geoffrey E. Hinton. Reducing the Dimensionality of Data with Neural , 2008 .

[2] Thomas Martinetz,et al. Neural Rewards Regression for near-optimal policy identification in Markovian and partial observable environments , 2007, ESANN.

[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4] Steffen Udluft,et al. Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks , 2012, Neural Networks: Tricks of the Trade.

[5] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[6] Steffen Udluft,et al. The Recurrent Control Neural Network , 2007, ESANN.

[7] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[8] Steffen Udluft,et al. A Neural Reinforcement Learning Approach to Gas Turbine Control , 2007, 2007 International Joint Conference on Neural Networks.

[9] Daniel Schneegaß,et al. Steigerung der Informationseffizienz im Reinforcement-Learning , 2008 .