Emergence of Prediction by Reinforcement Learning Using a Recurrent Neural Network

To develop a robot that behaves flexibly in the real world, it is essential that it learns various necessary functions autonomously without receiving significant information from a human in advance. Among such functions, this paper focuses on learning “prediction” that is attracting attention recently from the viewpoint of autonomous learning. The authors point out that it is important to acquire through learning not only the way of predicting future information, but also the purposive extraction of prediction target from sensor signals. It is suggested that through reinforcement learning using a recurrent neural network, both emerge purposively and simultaneously without testing individually whether or not each piece of information is predictable. In a task where an agent gets a reward when it catches a moving object that can possibly become invisible, it was observed that the agent learned to detect the necessary factors of the object velocity before it disappeared, to relay the information among some hidden neurons, and finally to catch the object at an appropriate position and timing, considering the effects of bounces off a wall after the object became invisible.

[1]  Jun Tani,et al.  Learning to generate articulated behavior through the bottom-up and the top-down interaction processes , 2003, Neural Networks.

[2]  Katsunari Shibata,et al.  Contextual Behaviors and Internal Representations Acquired by Reinforcement Learning with a Recurrent Neural Network in a Continuous State and Action Space Task , 2008, ICONIP.

[3]  Mark B. Ring Child: A First Step Towards Continual Learning , 1998, Learning to Learn.

[4]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[5]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[6]  Jonathan Baxter,et al.  Scaling Internal-State Policy-Gradient Methods for POMDPs , 2002 .

[7]  Jürgen Schmidhuber,et al.  Exploring the predictable , 2003 .

[8]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[9]  Jürgen Schmidhuber,et al.  A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[10]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[11]  Hajime Kita,et al.  Q-Learning with Recurrent Neural Networks as a Controller for the Inverted Pendulum Problem , 1998, ICONIP.

[12]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[13]  Michael H. Bowling,et al.  Online Discovery and Learning of Predictive State Representations , 2005, NIPS.

[14]  Tom M. Mitchell,et al.  Reinforcement learning with hidden states , 1993 .

[15]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[16]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[17]  G. Hartmann,et al.  Parallel Processing in Neural Systems and Computers , 1990 .

[18]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[19]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[20]  Douglas Aberdeen,et al.  Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[21]  Richard S. Sutton,et al.  Temporal-Difference Networks , 2004, NIPS.

[22]  Katsunari Shibata,et al.  Acquisition of Flexible Image Recognition by Coupling of Reinforcement Learning and a Neural Network , 2009 .