论文信息 - Self-modifying reinforcement learning

Self-modifying reinforcement learning

We describe several experiments with reinforcement learning systems based on the technique of incremental self-improvement (IS). IS uses the success-story algorithm (SSA) to undo unrewarding policy changes computed by self-modifying policies. The experiment demonstrates IS' advantages over stochastic hill climbing and TD Q-learning in noisy environments given limited computational resources.

Jieyu Zhao

[1] Isabelle Guyon,et al. Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[2] Geoffrey E. Hinton,et al. Learning and relearning in Boltzmann machines , 1986 .

[3] Martin Wattenberg,et al. Stochastic Hillclimbing as a Baseline Mathod for Evaluating Genetic Algorithms , 1995, NIPS.

[4] Russell Greiner,et al. PALO: A Probabilistic Hill-Climbing Algorithm , 1996, Artif. Intell..

[5] Jieyu Zhao,et al. Direct Policy Search and Uncertain Policy Evaluation , 1998 .

[6] Jürgen Schmidhuber,et al. Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.