论文信息 - Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of reinforcement learning in solving much more complicated learning tasks than previously studied, and 2) to investigate methods that will speed up reinforcement learning.This paper compares eight reinforcement learning frameworks: adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning. The three extensions are experience replay, learning action models for planning, and teaching. The frameworks were investigated using connectionism as an approach to generalization. To evaluate the performance of different frameworks, a dynamic environment was used as a testbed. The environment is moderately complex and nondeterministic. This paper describes these frameworks and algorithms in detail and presents empirical evaluation of the frameworks.

Longxin Lin

[1] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .

[2] Tom M. Mitchell,et al. Generalization as Search , 2002 .

[3] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[4] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[5] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .

[6] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[7] D. Ballard,et al. A Role for Anticipation in Reactive Systems that Learn , 1989, ML.

[8] C. Watkins. Learning from delayed rewards , 1989 .

[9] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .

[10] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[11] Kevin J. Lang. A time delay neural network architecture for speech recognition , 1989 .