Self-modifying reinforcement learning

We describe several experiments with reinforcement learning systems based on the technique of incremental self-improvement (IS). IS uses the success-story algorithm (SSA) to undo unrewarding policy changes computed by self-modifying policies. The experiment demonstrates IS' advantages over stochastic hill climbing and TD Q-learning in noisy environments given limited computational resources.