Reinforcement Learning when Results are Delayed and Interleaved in Time
暂无分享,去创建一个
Many real-world problems involve sequences where a automaton executes an action but there is some delay before the results of that action become apparent. A system is presented which learns to associate early stimuli with later reinforcement by buffering unfamiliar input images until that reinforcement arrives. It is shown to learn to predict the immediate results of various actions in a given state, to avoid entering negative next-states, and also to avoid entering positive next-states which lead in turn only to negative states. The system is capable of learning across indefinitely long reinforcement delays while only buffering a small number of past states locally at the nodes.
[1] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[2] Catherine E. Myers. Output functions for probabilistic logic nodes , 1989 .
[3] Bernard Widrow,et al. Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..
[4] A. Klopf. A neuronal model of classical conditioning , 1988 .