Reinforcement Learning when Results are Delayed and Interleaved in Time

Many real-world problems involve sequences where a automaton executes an action but there is some delay before the results of that action become apparent. A system is presented which learns to associate early stimuli with later reinforcement by buffering unfamiliar input images until that reinforcement arrives. It is shown to learn to predict the immediate results of various actions in a given state, to avoid entering negative next-states, and also to avoid entering positive next-states which lead in turn only to negative states. The system is capable of learning across indefinitely long reinforcement delays while only buffering a small number of past states locally at the nodes.