On Convergence Rate of Adaptive Multiscale Value Function Approximation for Reinforcement Learning

In this paper, we propose a generic framework for devising an adaptive approximation scheme for value function approximation in reinforcement learning, which introduces multiscale approximation. The two basic ingredients are multiresolution analysis as well as tree approximation. Starting from simple refinable functions, multiresolution analysis enables us to construct a wavelet system from which the basis functions are selected adaptively, resulting in a tree structure. Furthermore, we present the convergence rate of our multiscale approximation which does not depend on the regularity of basis functions.

[1]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[2]  I. Daubechies,et al.  Tree Approximation and Optimal Encoding , 2001 .

[3]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[4]  Convergence Rates of Multiscale and Wavelet Expansions , 2001 .

[5]  Shimon Whiteson,et al.  Adaptive Representations for Reinforcement Learning , 2010, Studies in Computational Intelligence.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Y. Meyer Wavelets and Operators , 1993 .

[8]  Bin Han,et al.  Directional compactly supported box spline tight framelets with simple geometric structure , 2019, Appl. Math. Lett..

[9]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[11]  Vladimir N. Temlyakov,et al.  The best m-term approximation and greedy algorithms , 1998, Adv. Comput. Math..

[12]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[13]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[14]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..