论文信息 - Speeding up Q(lambda)-Learning

Speeding up Q(lambda)-Learning

Q(λ)-learning uses TD(λ)-methods to accelerate Q-Learning. The worst case complexity for a single update step of previous online Q(λ) implementations based on lookup-tables is bounded by the size of the state/action space. Our faster algorithm's worst case complexity is bounded by the number of actions. The algorithm is based on the observation that Q-value updates may be postponed until they are needed.

Jürgen Schmidhuber | Marco Wiering

[1] Pawel Cichosz,et al. Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning , 1994, J. Artif. Intell. Res..

[2] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .

[3] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[5] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[6] Teuvo Kohonen,et al. Self-organization and associative memory: 3rd edition , 1989 .

[7] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[8] Teuvo Kohonen,et al. Self-Organization and Associative Memory , 1988 .

[9] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[10] Steven Douglas Whitehead,et al. Reinforcement learning for the adaptive control of perception and action , 1992 .

[11] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[12] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.