Convergent Reinforcement Learning with Value Function Interpolation

We consider the convergence of a class of reinforcement learning algorithms combined with value function interpolation methods using the methods developed in (Littman & Szepesvari, 1996). As a special case of the obtained general results, for the first time, we prove the (almost sure) convergence of Qlearning when combined with value function interpolation in uncountable spaces.

[1]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[2]  P. Schweitzer,et al.  Generalized polynomial approximations in Markovian decision processes , 1985 .

[3]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[4]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[5]  Andrew Tridgell,et al.  KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.

[6]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[7]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[8]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[9]  Thomas G. Dietterich,et al.  Efficient Value Function Approximation Using Regression Trees , 1999 .

[10]  Michael I. Jordan,et al.  On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001 .

[11]  Rémi Munos,et al.  Finite-Element Methods with Local Triangulation Refinement for Continuous Reimforcement Learning Problems , 1997, ECML.

[12]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[13]  Jonathan Baxter KnightCap : A chess program that learns by combining TD ( ) with game-tree search , 1998 .

[14]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[15]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[16]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[17]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[18]  R. Bellman,et al.  FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .

[19]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[20]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[21]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.