Convergence Results for Some Temporal Difference Methods Based on Least Squares
暂无分享,去创建一个
[1] James M. Ortega,et al. Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.
[2] Stephen S. Wilson,et al. Random iterative models , 1996 .
[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[4] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[5] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[6] Fernando J. Pineda,et al. Mean-Field Theory for Batched TD() , 1997, Neural Computation.
[7] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[8] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[9] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[10] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[11] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[12] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[13] Benjamin Van Roy,et al. On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .
[14] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[15] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[16] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[17] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[18] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[19] John N. Tsitsiklis,et al. On Average Versus Discounted Reward Temporal-Difference Learning , 2002, Machine Learning.
[20] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[21] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[22] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[23] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[24] Huizhen Yu,et al. A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies , 2005, UAI.
[25] Vivek S. Borkar,et al. Stochastic approximation with 'controlled Markov' noise , 2006, Systems & control letters (Print).
[26] D. Bertsekas,et al. A Least Squares Q-Learning Algorithm for Optimal Stopping Problems , 2007 .
[27] D. Bertsekas,et al. Q-learning algorithms for optimal stopping based on least squares , 2007, 2007 European Control Conference (ECC).
[28] Dimitri P. Bertsekas,et al. New error bounds for approximations from projected linear equations , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.
[29] D. Bertsekas,et al. New error bounds for approximations from projected linear equations , 2008, Allerton 2008.
[30] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[31] Dimitri P. Bertsekas,et al. Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..