论文信息 - A Universal Generalization for Temporal-Difference Learning Using Haar Basis Functions

A Universal Generalization for Temporal-Difference Learning Using Haar Basis Functions

We propose an algorithm e ciently implementing TD( ) using (the in nite tree of) Haar basis functions. The algorithm can maintain and update the information of the in nite tree of coe cients in its nitely compressed form by taking advantage of the fact that the information obtained from nite training data is nite. Our algorithm computes the whole updating at each time step in time linear in the precision (measured by the number of bits) of each observation. The system of Haar basis functions includes both broad features, which have strong generalization and averaging ability, and narrow features, which have high precision approximation ability. Especially, since it can approximate arbitrary continuous functions on [0;1) in the limit, TD( ) for Haar basis functions obtains the best solutions for all problems to obtain value functions on [0;1), apart from the possibility it may be slower to converge than other methods tuned with labor. The universality in this sense is precious because the main application of TD( ) is reinforcement learning, where the environment is unknown. Although the only concern of our method is that the space complexity increases linearly in the progress in time steps, experimental results show that it yields no problem provided that it adopts an appropriate forgetting strategy.

[1] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[2] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[3] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[4] Amara Lynn Graps,et al. An introduction to wavelets , 1995 .

[5] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[6] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[7] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[8] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[9] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .