论文信息 - Generalization in Reinforcement Learning: Safely Approximating the Value Function

Generalization in Reinforcement Learning: Safely Approximating the Value Function

A straightforward approach to the curse of dimensionality in reinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neural net. Although this has been successful in the domain of backgammon, there is no guarantee of convergence. In this paper, we show that the combination of dynamic programming and function approximation is not robust, and in even very benign cases, may produce an entirely wrong policy. We then introduce Grow-Support, a new algorithm which is safe from divergence yet can still reap the benefits of successful generalization.

Andrew W. Moore | Justin A. Boyan | A. Moore | J. Boyan

[1] R. Bellman,et al. Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1962 .

[2] R. Lathe. Phd by thesis , 1988, Nature.

[3] W. Cleveland,et al. Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[4] A. Barto,et al. Learning and Sequential Decision Making , 1989 .

[5] Richard Yee,et al. Abstraction in Control Learning , 1992 .

[6] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[7] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[8] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[9] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[10] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..