A Self-Organising State Space Decoder for Reinforcement Learning

A novel self-organising architecture, loosely based upon a particular implementation of adaptive resonance theory is proposed here as an alternative to the fixed state space decoder in the seminal implementation of reinforcement learning of Barto, Sutton and Anderson. A well known non-linear control problem is considered and the results are compared to those of the original study. The objective is to illustrate the possibility of neurocontrollers that adaptively partition state space through experience without the need for a priori knowledge. Input/output pattern pairs, desired state space regions and the network size/topology are not known in advance. Results show that, although learning is not smooth, the novel reinforcement learning implementation introduced here is successful and develops an effective control mapping. The self-organising properties of the new decoder allow the neurocontroller to retain previously learned information and adapt to newly encountered states throughout its operation, on-line. The new decoder increases its information capacity as necessary. The adaptive search element and the adaptive critic element of the original study are retained.