A possibility for implementing curiosity and boredom in model-building neural controllers

This paper introduces a framework for`curious neural controllers' which employ an adaptive world model for goal directed on-line learning. First an on-line reinforcement learning algorithm for autonomousànimats' is described. The algorithm is based on two fully recurrent`self-supervised' continually running networks which learn in parallel. One of the networks learns to represent a complete model of the environmental dynamics and is called thèmodel network'. It provides completècredit assignment paths' into the past for the second network which controls the animats physical actions in a possibly reactive environment. The an-imats goal is to maximize cumulative reinforcement and minimize cumulativèpain'. The algorithm has properties which allow to implement something like the desire to improve the model network's knowledge about the world. This is related to curiosity. It is described how the particular algorithm (as well as similar model-building algorithms) may be augmented by dynamic curiosity and boredom in a natural manner. This may be done by introducing (delayed) reinforcement for actions that increase the model network's knowledge about the world. This in turn requires the model network to model its own ignorance, thus showing a rudimentary form of self-introspective behavior.

[1]  H. Franke,et al.  Ästhetik als Informationsverarbeitung , 1974 .

[2]  Anthony J. Robinson,et al.  Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.

[3]  Michael I. Jordan Supervised learning and systems with excess degrees of freedom , 1988 .

[4]  R. J. Williams,et al.  On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.

[5]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[6]  Frank Fallside,et al.  Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .

[7]  Ronald J. Williams,et al.  Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[8]  Jürgen Schmidhuber,et al.  A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[9]  P. J. Werbos,et al.  Backpropagation and neurocontrol: a review and prospectus , 1989, International 1989 Joint Conference on Neural Networks.

[10]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Interacting Continually Running Fully Recurrent Networks , 1990 .

[11]  Jürgen Schmidhuber,et al.  Recurrent networks adjusted by adaptive critics , 1990 .

[12]  Jürgen Schmidhuber,et al.  An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.