论文信息 - Convergent Reinforcement Learning with Value Function Interpolation

Convergent Reinforcement Learning with Value Function Interpolation

We consider the convergence of a class of reinforcement learning algorithms combined with value function interpolation methods using the methods developed in (Littman & Szepesvari, 1996). As a special case of the obtained general results, for the first time, we prove the (almost sure) convergence of Qlearning when combined with value function interpolation in uncountable spaces.

Csaba Szepesvári

[1] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[2] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .

[3] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[4] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[5] Andrew Tridgell,et al. KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.

[6] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[7] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[8] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[9] Thomas G. Dietterich,et al. Efficient Value Function Approximation Using Regression Trees , 1999 .

[10] Michael I. Jordan,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001 .

[11] Rémi Munos,et al. Finite-Element Methods with Local Triangulation Refinement for Continuous Reimforcement Learning Problems , 1997, ECML.