Approximate Dynamic Programming Based on Expansive Projections

We present a general method to obtain convergent approximate value iteration algorithms with function approximation. The result is applicable to any arbitrary approximation architecture and generalizes existing results in the literature derived for particular approximation schemes. Additionally, we show how to obtain a convergent approximate mapping whose fixed point is the projection in the approximation space of a fixed point of the exact dynamic programming mapping with regards to a suitable subset norm. This result relies on evaluating the difference between successive iterates in the selected subset norm, which provides convergent procedures for any arbitrary approximation architecture

[1]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[2]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3]  Doina Precup,et al.  A Convergent Form of Approximate Policy Iteration , 2002, NIPS.

[4]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[5]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[6]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[7]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  John N. Tsitsiklis,et al.  Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..

[10]  Geoffrey J. Gordon Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.

[11]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[12]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[13]  Csaba Szepesvári Convergent Reinforcement Learning with Value Function Interpolation , 2001 .

[14]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[15]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[16]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .