A robot that reinforcement-learns to identify and memorize important previous observations

It is difficult to apply traditional reinforcement learning algorithms to robots, due to problems with large and continuous domains, partial observability, and limited numbers of learning experiences. This paper deals with these problems by combining: (1) reinforcement learning with memory, implemented using an LSTM recurrent neural network whose inputs are discrete events extracted from raw inputs; (2) online exploration and offline policy learning. An experiment with a real robot demonstrates the methodology's feasibility.

[1]  Barruquer Moner IX. References , 1971 .

[2]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[3]  Tom M. Mitchell,et al.  Reinforcement learning with hidden states , 1993 .

[4]  Michael L. Littman,et al.  An optimization-based categorization of reinforcement learning environments , 1993 .

[5]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[6]  Mance E. Harmon,et al.  Multi-Agent Residual Advantage Learning with General Function Approximation. , 1996 .

[7]  Mark Harmon Multi-player residual advantage learning with general function , 1996 .

[8]  Maja J. Matarić,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[11]  Henrik Jacobsson,et al.  Mobile Robot Learning of Delayed Response Tasks through Event Extraction: A Solution to the Road Sign Problem and Beyond , 2001, IJCAI.

[12]  Jürgen Schmidhuber,et al.  Reinforcement learning in partially observable mobile robot domains using unsupervised event extraction , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Chris A. Czarnecki,et al.  Embedding Connectionist Autonomous Agents in Time: The ‘Road Sign Problem’ , 2000, Neural Processing Letters.

[14]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.