论文信息 - Reinforcement Learning by Construction of Hypothetical Targets

Reinforcement Learning by Construction of Hypothetical Targets

A general approach to delayed reinforcement learning by the use of supervised training algorithms is proposed. The approach is monolithic, direct and involves minor modifications to any supervised learning algorithm. Each connection has two weight change registers, one for eventual success and one for eventual failure, and the network is trained on self-generated hypothetical target vectors. The method is a close, but more general, relative to selective bootstrap adaption and is tested on an abstract model of the Link Allocation problem in Asynchronous Transfer Mode (ATM) telecommunication networks.

Olle Gällmo | L. Asplund

[1] Bernard Widrow,et al. Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[2] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[3] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[4] B. Widrow,et al. The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[5] Frank Fallside,et al. Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .

[6] Rainer Händel,et al. Integrated broadband networks : an introduction to ATM-based networks , 1991 .

[7] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.