论文信息 - Temporal Diierence Learning in Continuous Time and Space

Temporal Diierence Learning in Continuous Time and Space

A continuous-time, continuous-state version of the temporal diier-ence (TD) algorithm is derived in order to facilitate the application of reinforcement learning to real-world control tasks and neurobi-ological modeling. An optimal nonlinear feedback control law was also derived using the derivatives of the value function. The performance of the algorithms was tested in a task of swinging up a pendulum with limited torque. Both the \critic" that speciies the paths to the upright position and the \actor" that works as a non-linear feedback controller were successfully implemented by radial basis function (RBF) networks.

M. Hasselmo | M. Mozer

[1] Arthur E. Bryson,et al. Applied Optimal Control , 1969 .

[2] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[3] J J Hopfield,et al. Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[4] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[5] Joel L. Davis,et al. A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[6] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[7] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[8] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..