Reinforcement learning using back-propagation as a building block

A novel unsupervised reinforcement learning rule is introduced, based on the use of the supervised backpropagation algorithm as a component building block. The learning rule is easy to understand and implement in software and builds on the accumulated experience of researchers using backpropagation. Unlike most reinforcement learning systems, the new rule can operate with either continuously valued or binary outputs. It is very tolerant with respect to a wide variety of performance measures and is unrestricted in range and variability. The technique should find application in most reinforcement learning situations but should have particular benefit for learning control systems.<<ETX>>