论文信息 - Reinforcement Learning with Interacting Continually Running Fully Recurrent Networks

Reinforcement Learning with Interacting Continually Running Fully Recurrent Networks

We describe an on-line learning algorithm for attacking the fundamental credit assignment problem in non-stationary reactive environments. Reinforcement and pain are considered as special types of input to an agent living in the environment. The agent’s only goal is to maximize cumulative reinforcement and to minimize cumulative pain. This simple goal may require to produce complicated action sequences. Supervised learning techniques for recurrent networks serve to construct a differentiable model of the environmental dynamics which includes a model of future reinforcement. While this model is adapted, it is concurrently used for learning goal directed behavior. The method extends work done by Munro, Robinson and Fallside, Werbos, Widrow, and Jordan.

Jürgen Schmidhuber | J. Schmidhuber

[1] Anthony J. Robinson,et al. Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.

[2] B. Widrow,et al. The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[3] Michael I. Jordan. Supervised learning and systems with excess degrees of freedom , 1988 .

[4] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[5] Z. Schreter,et al. The Neural Bucket Brigade , 1989 .

[6] Frank Fallside,et al. Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .

[7] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.