New error bounds for approximations from projected linear equations
暂无分享,去创建一个
[1] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[2] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[3] Benjamin Van Roy,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[4] M. A. Krasnoselʹskii,et al. Approximate Solution of Operator Equations , 1972 .
[5] Dimitri P. Bertsekas,et al. A Counterexample to Temporal Differences Learning , 1995, Neural Computation.
[6] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[7] Daniel B. Szyld,et al. The many proofs of an identity on the norm of oblique projections , 2006, Numerical Algorithms.
[8] D. Bertsekas,et al. New error bounds for approximations from projected linear equations , 2008, Allerton 2008.
[9] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[10] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[12] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..
[13] D. Bertsekas,et al. A Least Squares Q-Learning Algorithm for Optimal Stopping Problems , 2007 .
[14] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[15] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[16] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[17] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[18] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[19] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[20] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[21] Benjamin Van Roy. On Regression-Based Stopping Times , 2010, Discret. Event Dyn. Syst..
[22] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .
[23] D. Bertsekas,et al. Q-learning algorithms for optimal stopping based on least squares , 2007, 2007 European Control Conference (ECC).
[24] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[25] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..