论文信息 - Analyzing feature generation for value-function approximation

Analyzing feature generation for value-function approximation

We analyze a simple, Bellman-error-based approach to generating basis functions for value-function approximation. We show that it generates orthogonal basis functions that provably tighten approximation error bounds. We also illustrate the use of this approach in the presence of noise on some sample problems.

[1] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[2] Sridhar Mahadevan,et al. Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions , 2005, NIPS.

[3] Alexander J. Smola,et al. Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[4] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[5] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.

[6] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .

[7] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[8] Shie Mannor,et al. Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[9] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[10] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.

[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[12] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.

[13] Daphne Koller,et al. Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[14] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[15] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[16] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.