论文信息 - On Discontinuous Q-Functions in Reinforcment Learning

On Discontinuous Q-Functions in Reinforcment Learning

This paper considers the application of reinforcement learning to path finding tasks in continuous state space in the presence of obstacles. We show that cumulative evaluation functions (as Q-Functions [28] and V-Functions [4]) may be discontinuous if forbidden regions (as implied by obstacles) exist in state space. As the infinite number of states requires the use of function approximators such as backpropagation nets [16, 12, 24], we argue that these discontinuities imply severe difficulties in learning cumulative evaluation functions. The discontinuities we detected might also explain why recent applications of reinforcement learning systems to complex tasks [12] failed to show desired performance. In our conclusion, we outline some ideas to circumvent the problem.

Alexander Linden | A. Linden

[1] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[2] Marvin Minsky,et al. Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[3] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .

[4] A. Barto,et al. Learning and Sequential Decision Making , 1989 .

[5] P. J. Werbos,et al. Backpropagation and neurocontrol: a review and prospectus , 1989, International 1989 Joint Conference on Neural Networks.

[6] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[7] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[8] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[9] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[10] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .

[11] F. J. Śmieja,et al. Multiple Network Systems (Minos) Modules: Task Division and Module Discrimination , 1991 .