论文信息 - New error bounds for approximations from projected linear equations

New error bounds for approximations from projected linear equations

We consider linear fixed point equations and their approximations by projection on a low dimensional subspace. We derive new bounds on the approximation error of the solution, which are expressed in terms of low dimensional matrices and can be computed by simulation. When the fixed point mapping is a contraction, as is typically the case in Markovian decision processes (MDP), one of our bounds is always sharper than the standard worst case bounds, and another one is often sharper. Our bounds also apply to the non-contraction case, including policy evaluation in MDP with nonstandard projections that enhance exploration. There are no error bounds currently available for this case to our knowledge.

Dimitri P. Bertsekas | Huizhen Yu | D. Bertsekas | Huizhen Yu

[1] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[2] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[3] Benjamin Van Roy,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[4] M. A. Krasnoselʹskii,et al. Approximate Solution of Operator Equations , 1972 .

[5] Dimitri P. Bertsekas,et al. A Counterexample to Temporal Differences Learning , 1995, Neural Computation.

[6] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..

[7] Daniel B. Szyld,et al. The many proofs of an identity on the norm of oblique projections , 2006, Numerical Algorithms.

[8] D. Bertsekas,et al. New error bounds for approximations from projected linear equations , 2008, Allerton 2008.

[9] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[10] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.

[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[12] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..

[13] D. Bertsekas,et al. A Least Squares Q-Learning Algorithm for Optimal Stopping Problems , 2007 .

[14] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[15] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[16] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[17] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[18] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .

[19] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[20] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.

[21] Benjamin Van Roy. On Regression-Based Stopping Times , 2010, Discret. Event Dyn. Syst..

[22] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .

[23] D. Bertsekas,et al. Q-learning algorithms for optimal stopping based on least squares , 2007, 2007 European Control Conference (ECC).

[24] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .

[25] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..