论文信息 - Solution of Large Systems of Equations Using Approximate Dynamic Programming Methods

Solution of Large Systems of Equations Using Approximate Dynamic Programming Methods

We consider fixed point equations, and approximation of the solution by projection on a low-dimensional subspace. We propose stochastic iterative algorithms, based on simulation, which converge to the approximate solution and are suitable for large-dimensional problems. We focus primarily on general linear systems and propose extensions of recent approximate dynamic programming methods, based on the use of temporal dierences, which solve a projected form of Bellman’s equation by using simulation-based approximations

D. Bertsekas | Huizhen Yu

[1] B. Rozovskii,et al. Optimal Stopping of Markov Processes , 1978 .

[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[5] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .

[6] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[7] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .

[8] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[9] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[10] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..

[11] Tim Hesterberg,et al. Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.