论文信息 - Reinforcement Learning in Markovian and Non-Markovian Environments

Reinforcement Learning in Markovian and Non-Markovian Environments

This work addresses three problems with reinforcement learning and adaptive neuro-control: 1. Non-Markovian interfaces between learner and environment. 2. On-line learning based on system realization. 3. Vector-valued adaptive critics. An algorithm is described which is based on system realization and on two interacting fully recurrent continually running networks which may learn in parallel. Problems with parallel learning are attacked by 'adaptive randomness'. It is also described how interacting model/controller systems can be combined with vector-valued 'adaptive critics' (previous critics have been scalar).

Jürgen Schmidhuber | J. Schmidhuber

[1] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[2] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[3] Michael I. Jordan. Supervised learning and systems with excess degrees of freedom , 1988 .

[4] R. J. Williams,et al. On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.

[5] Frank Fallside,et al. Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .

[6] Ronald J. Williams,et al. Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[7] Michael I. Jordan,et al. Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[8] Jürgen Schmidhuber,et al. Networks adjusting networks , 1990, Forschungsberichte, TU Munich.

[9] S. Piche,et al. First-Order Gradient Descent Training of Adaptive Discrete-Time Dynamic Networks , 1991 .