论文信息 - On The Virtues of Linear Learning and Trajectory Distributions

On The Virtues of Linear Learning and Trajectory Distributions

In contrast to recent work by Boyan and Moore, I have obtained excellent results applying conventional VFA methods to control tasks such as “Mountain Car” and “Acrobot”. The difference in our results is due to the form of function approximator and the nature of the training. I used a local function approximator known as a CMAC—essentially a linear learner using a sparse, coarse-coded representation of the state. Although not a complete solution, I argue that this approach solves the “curse of dimensionality” as well as one can expect, and enables VFA systems to generalize as well as other artificial learning systems. Also, I used TD(λ) and trained on state transitions from actual, experienced trajectories. A theorem by Dayan assures stability of TD(λ) when trained using such trajectory distributions. Gordon and Tsitsiklis and Van Roy have recently shown that TD(λ) can be unstable when trained with other distributions. Finally, I present a small compendium of results all showing much faster learning when λ is slightly less than 1.

R. Sutton