A Universal Generalization for Temporal-Difference Learning Using Haar Basis Functions

We propose an algorithm e ciently implementing TD( ) using (the in nite tree of) Haar basis functions. The algorithm can maintain and update the information of the in nite tree of coe cients in its nitely compressed form by taking advantage of the fact that the information obtained from nite training data is nite. Our algorithm computes the whole updating at each time step in time linear in the precision (measured by the number of bits) of each observation. The system of Haar basis functions includes both broad features, which have strong generalization and averaging ability, and narrow features, which have high precision approximation ability. Especially, since it can approximate arbitrary continuous functions on [0;1) in the limit, TD( ) for Haar basis functions obtains the best solutions for all problems to obtain value functions on [0;1), apart from the possibility it may be slower to converge than other methods tuned with labor. The universality in this sense is precious because the main application of TD( ) is reinforcement learning, where the environment is unknown. Although the only concern of our method is that the space complexity increases linearly in the progress in time steps, experimental results show that it yields no problem provided that it adopts an appropriate forgetting strategy.