Solution of Large Systems of Equations Using Approximate Dynamic Programming Methods

We consider fixed point equations, and approximation of the solution by projection on a low-dimensional subspace. We propose stochastic iterative algorithms, based on simulation, which converge to the approximate solution and are suitable for large-dimensional problems. We focus primarily on general linear systems and propose extensions of recent approximate dynamic programming methods, based on the use of temporal dierences, which solve a projected form of Bellman’s equation by using simulation-based approximations

[1]  B. Rozovskii,et al.  Optimal Stopping of Markov Processes , 1978 .

[2]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[4]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[5]  S. Ioffe,et al.  Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .

[6]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[7]  Dimitri P. Bertsekas,et al.  Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .

[8]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[9]  John N. Tsitsiklis,et al.  Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[10]  John N. Tsitsiklis,et al.  Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..

[11]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[12]  Dimitri P. Bertsekas,et al.  Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..

[13]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[14]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[15]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[16]  A. Barto,et al.  Improved Temporal Difference Methods with Linear Function Approximation , 2004 .

[17]  Shie Mannor,et al.  Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[18]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[19]  David Choi,et al.  A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..

[20]  Shie Mannor,et al.  Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[21]  Mario J. Valenti Approximate dynamic programming with applications in multi-agent systems , 2007 .

[22]  D. Bertsekas,et al.  A Least Squares Q-Learning Algorithm for Optimal Stopping Problems , 2007 .

[23]  Lihong Li,et al.  Analyzing feature generation for value-function approximation , 2007, ICML '07.

[24]  Vivek S. Borkar,et al.  A Learning Algorithm for Risk-Sensitive Cost , 2008, Math. Oper. Res..

[25]  Dimitri P. Bertsekas,et al.  Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.

[26]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.