Temporal Difference Methods for General Projected Equations
暂无分享,去创建一个
[1] B. Martinet,et al. R'egularisation d''in'equations variationnelles par approximations successives , 1970 .
[2] M. A. Krasnoselʹskii,et al. Approximate Solution of Operator Equations , 1972 .
[3] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[4] D. Bertsekas,et al. Projection methods for variational inequalities with application to the traffic assignment problem , 1982 .
[5] C. Fletcher. Computational Galerkin Methods , 1983 .
[6] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[8] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[9] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[10] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[11] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[12] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[13] Guanrong Chen,et al. Approximate Solutions of Operator Equations , 1997 .
[14] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[15] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[16] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[17] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[18] Benjamin Van Roy,et al. On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .
[19] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[20] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[21] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .
[22] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[23] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[24] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[25] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[26] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..
[27] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .
[28] D. Bertsekas,et al. A Least Squares Q-Learning Algorithm for Optimal Stopping Problems , 2007 .
[29] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[30] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[31] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[32] D. Bertsekas. Projected Equations, Variational Inequalities, and Temporal Difference Methods , 2009 .
[33] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[34] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[35] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.