论文信息 - Issues in Using Function Approximation for Reinforcement Learning

Issues in Using Function Approximation for Reinforcement Learning

Reinforcement learning techniques address the problem of learning to select actions in unknown, dynamic environments. It is widely acknowledged that to be of use in complex domains, reinforcement learning techniques must be combined with generalizing function approximation methods such as artificial neural networks. Little, however, is understood about the theoretical properties of such combinations, and many researchers have encountered failures in practice. In this paper we identify a prime source of such failures—namely, a systematic overestimation of utility values. Using Watkins’ Q-Learning [18] as an example, we give a theoretical account of the phenomenon, deriving conditions under which one may expected it to cause learning to fail. Employing some of the most popular function approximators, we present experimental results which support the theoretical findings.

Sebastian Thrun | Anton Schwartz | S. Thrun | Anton Schwartz

[1] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[2] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[3] C. Watkins. Learning from delayed rewards , 1989 .

[4] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .

[5] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[6] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[7] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .

[8] Sebastian Thrun,et al. Explanation-Based Neural Network Learning for Robot Control , 1992, NIPS.

[9] Sven Koenig,et al. Complexity Analysis of Real-Time Reinforcement Learning , 1992, AAAI.

[10] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[11] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..