Reinforcement Learning by Construction of Hypothetical Targets

A general approach to delayed reinforcement learning by the use of supervised training algorithms is proposed. The approach is monolithic, direct and involves minor modifications to any supervised learning algorithm. Each connection has two weight change registers, one for eventual success and one for eventual failure, and the network is trained on self-generated hypothetical target vectors. The method is a close, but more general, relative to selective bootstrap adaption and is tested on an abstract model of the Link Allocation problem in Asynchronous Transfer Mode (ATM) telecommunication networks.