论文信息 - Rates of Convergence for Variable Resolution Schemes in Optimal Control

Rates of Convergence for Variable Resolution Schemes in Optimal Control

This paper presents a general method to derive tight rates of convergence for numerical approximations in optimal control when we consider variable resolution grids. We study the continuous-space, discrete-time, and discrete-controls case. Previous work described methods to obtain rates of convergence using general approximators (Bert-sekas, 1987), multi-grid (Chow & Tsitsiklis, 1991) or Random grids (Rust, 1996). These results provide bounds on the error of approximation of the value function as a function of the space discretization resolution (or the number of grid-points) which is assumed to be uniform. Consequently, they do not consider the beneet of using non-uniform resolutions. However, empirical results (Munos & Moore, 1999b) have shown the importance of using variable resolution discretizations, especially for problem of high-dimensional state-space (in order to attack the \curse of dimensionality"). This paper provides some bounds on the approximation error of the value function in terms of the local interpolation error. Additionally, we are able to predict the eeect that locally increasing the grid-resolution has on the global performance of the controller, thus opening a way for designing eecient grid-reenement procedures. This analysis can be applied to stochastic or deter-ministic problems and can be used with many grid interpolators, including grids based on kd-trees, multi-grids, random, or low discrepancy grids.

Andrew W. Moore | Rémi Munos | R. Munos | A. Moore

[1] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[2] J. Tsitsiklis,et al. An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .

[3] Harald Niederreiter,et al. Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[4] H. Niederreiter. Nonlinear Methods for Pseudorandom Number and Vector Generation , 1992 .

[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6] Scott Davies,et al. Multidimensional Triangulation and Interpolation for Reinforcement Learning , 1996, NIPS.

[7] John Rust. Numerical dynamic programming in economics , 1996 .

[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[9] Andrew W. Moore,et al. Barycentric Interpolators for Continuous Space and Time Reinforcement Learning , 1998, NIPS.

[10] Andrew W. Moore,et al. Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.

[11] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .