论文信息 - Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments

Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments

Reinforcement Learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov Decision Processes quite well. For Partially Observable Markov Decision Processes, a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, we present a method to speed up learning performance in non-Markovian environments by focusing on necessary state-action pairs in learning episodes. Whenever the agent can attain the goal, the agent checks the episode and relearns necessary actions. We use a table, storing minimum number of appearances of states in all successful episodes, to remove unnecessary state-action pairs in a successful episode and to form a min-episode. To verify this method, we performed two experiments: The E maze problem with Time-delay Neural Network and the lighting grid world problem with Long Short Term Memory RNN. Experimental results show that the proposed method enables an agent to acquire a policy with better learning performance compared to the standard method.

Takashi Komeda | Motoki Takagi | Le Tien Dung

[1] Bram Bakker,et al. Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Jürgen Schmidhuber,et al. Co-evolving recurrent neurons learn deep memory POMDPs , 2005, GECCO '05.

[4] Il Hong Suh,et al. A reinforcement learning approach involving a shortest path finding algorithm , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[5] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[6] Mohamed S. Kamel,et al. Reinforcement learning using a recurrent neural network , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[7] Il Hong Suh,et al. A novel dynamic priority-based action-selection-mechanism integrating a reinforcement learning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[8] Andrew McCallum,et al. Instance-Based State Identification for Reinforcement Learning , 1994, NIPS.

[9] I. Noda,et al. Using suitable action selection rule in reinforcement learning , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[10] G. Miller. Learning to Forget , 2004, Science.

[11] Risto Miikkulainen,et al. Efficient Non-linear Control Through Neuroevolution , 2006, ECML.

[12] Il Hong Suh,et al. Learning of action patterns and reactive behavior plans via a novel two-layered ethology-based action selection mechanism , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14] Hajime Kita,et al. Reinforcement learning of dynamic behavior by using recurrent neural networks , 1997, Artificial Life and Robotics.

[15] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[16] Jürgen Schmidhuber,et al. Training Recurrent Networks by Evolino , 2007, Neural Computation.

[17] Hani Al-Dayaa,et al. Fast Reinforcement Learning Techniques Using the Euclidean Distance and Agent State Occurrence Frequency , 2006, MLMTA.

[18] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[19] Jürgen Schmidhuber,et al. A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[20] Eduardo Morgado Belo,et al. Application of time-delay neural and recurrent neural networks for the identification of a hingeless helicopter blade flapping and torsion motions , 2005 .