Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments

Reinforcement Learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov Decision Processes quite well. For Partially Observable Markov Decision Processes, a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, we present a method to speed up learning performance in non-Markovian environments by focusing on necessary state-action pairs in learning episodes. Whenever the agent can attain the goal, the agent checks the episode and relearns necessary actions. We use a table, storing minimum number of appearances of states in all successful episodes, to remove unnecessary state-action pairs in a successful episode and to form a min-episode. To verify this method, we performed two experiments: The E maze problem with Time-delay Neural Network and the lighting grid world problem with Long Short Term Memory RNN. Experimental results show that the proposed method enables an agent to acquire a policy with better learning performance compared to the standard method.

[1]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Jürgen Schmidhuber,et al.  Co-evolving recurrent neurons learn deep memory POMDPs , 2005, GECCO '05.

[4]  Il Hong Suh,et al.  A reinforcement learning approach involving a shortest path finding algorithm , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[5]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[6]  Mohamed S. Kamel,et al.  Reinforcement learning using a recurrent neural network , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[7]  Il Hong Suh,et al.  A novel dynamic priority-based action-selection-mechanism integrating a reinforcement learning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[8]  Andrew McCallum,et al.  Instance-Based State Identification for Reinforcement Learning , 1994, NIPS.

[9]  I. Noda,et al.  Using suitable action selection rule in reinforcement learning , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[10]  G. Miller Learning to Forget , 2004, Science.

[11]  Risto Miikkulainen,et al.  Efficient Non-linear Control Through Neuroevolution , 2006, ECML.

[12]  Il Hong Suh,et al.  Learning of action patterns and reactive behavior plans via a novel two-layered ethology-based action selection mechanism , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Hajime Kita,et al.  Reinforcement learning of dynamic behavior by using recurrent neural networks , 1997, Artificial Life and Robotics.

[15]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[16]  Jürgen Schmidhuber,et al.  Training Recurrent Networks by Evolino , 2007, Neural Computation.

[17]  Hani Al-Dayaa,et al.  Fast Reinforcement Learning Techniques Using the Euclidean Distance and Agent State Occurrence Frequency , 2006, MLMTA.

[18]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[19]  Jürgen Schmidhuber,et al.  A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[20]  Eduardo Morgado Belo,et al.  Application of time-delay neural and recurrent neural networks for the identification of a hingeless helicopter blade flapping and torsion motions , 2005 .