论文信息 - Learning Algorithms for Networks with Internal and External Feedback

Learning Algorithms for Networks with Internal and External Feedback

Abstract This paper gives an overview of some novel algorithms for reinforcement learning in non-stationary possibly reactive environments. I have decided to describe many ideas briefly rather than going into great detail on any one idea. The paper is structured as follows: In the first section some terminology is introduced. Then there follow five sections, each headed by a short abstract. The second section describes the entirely local ‘neural bucket brigade algorithm’. The third section applies Sutton's TD-methods to fully recurrent continually running probabilistic networks. The fourth section describes an algorithm based on system identification and on two interacting fully recurrent ‘self-supervised’ learning networks. The fifth section describes an application of adaptive control techniques to adaptive attentive vision: It demonstrates how ‘selective attention’ can be learned. Finally, the sixth section critisizes methods based on system identification and adaptive critics, and describes an adaptive subgoal generator.

Jürgen Schmidhuber | J. Schmidhuber

[1] Ronald J. Williams,et al. Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[2] M. Gherrity,et al. A learning algorithm for analog, fully recurrent neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[3] Jürgen Schmidhuber,et al. Reinforcement Learning with Interacting Continually Running Fully Recurrent Networks , 1990 .

[4] Jürgen Schmidhuber,et al. Recurrent networks adjusted by adaptive critics , 1990 .

[5] Barak A. Pearlmutter. Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[6] R. J. Williams,et al. On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.

[7] David Zipser,et al. Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[8] Michael I. Jordan. Supervised learning and systems with excess degrees of freedom , 1988 .

[9] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[10] Yann LeCun,et al. Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .

[11] Stewart W. Wilson,et al. A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[12] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[13] Frank Fallside,et al. Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .

[14] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[15] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[16] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[17] Jürgen Schmidhuber,et al. The neural bucket brigade , 1989 .

[18] John H. Holland,et al. Properties of the Bucket Brigade , 1985, ICGA.

[19] B. Widrow,et al. The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[20] Jürgen Schmidhuber,et al. A local learning algorithm for dynamic feedforward and recurrent networks , 1990, Forschungsberichte, TU Munich.

[21] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[22] Jürgen Schmidhuber,et al. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[23] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.