论文信息 - Reinforcement Learning with Long Short-Term Memory

Reinforcement Learning with Long Short-Term Memory

This paper presents reinforcement learning with a Long Short-Term Memory recurrent neural network: RL-LSTM. Model-free RL-LSTM using Advantage (λ) learning and directed exploration can solve non-Markovian tasks with long-term dependencies between relevant events. This is demonstrated in a T-maze task, as well as in a difficult variation of the pole balancing task.

Bram Bakker | B. Bakker

[1] Jürgen Schmidhuber,et al. Networks adjusting networks , 1990, Forschungsberichte, TU Munich.

[2] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[3] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[4] Tom M. Mitchell,et al. Reinforcement learning with hidden states , 1993 .

[5] Mance E. Harmon,et al. Multi-Agent Residual Advantage Learning with General Function Approximation. , 1996 .

[6] Mark Harmon. Multi-player residual advantage learning with general function , 1996 .

[7] Maja J. Matarić,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[8] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[9] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[10] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[11] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.

[12] Jürgen Schmidhuber,et al. Learning to forget: continual prediction with LSTM , 1999 .

[13] Bram Bakker,et al. Reinforcement Learning with LSTM in Non-Markovian Tasks with Long-Term Dependencies , 2001 .