论文信息 - Optimality of Reinforcement Learning Algorithms with Linear Function Approximation

Optimality of Reinforcement Learning Algorithms with Linear Function Approximation

There are several reinforcement learning algorithms that yield approximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective function. Moreover, we characterise the different solutions as images of the optimal exact value function under different projection operations. The results presented here will be useful for comparing the algorithms in terms of the error they achieve relative to the error of the optimal approximate solution.

Ralf Schoknecht | Ralf Schoknecht

[1] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[2] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[4] Anne Greenbaum,et al. Iterative methods for solving linear systems , 1997, Frontiers in applied mathematics.

[5] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.

[6] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.

[7] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.

[8] Artur Merke,et al. Convergent Combinations of Reinforcement Learning with Linear Function Approximation , 2002, NIPS.

[9] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.