Least Squares Temporal Difference Methods: An Analysis under General Conditions
暂无分享,去创建一个
[1] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[2] Sean P. Meyn. Control Techniques for Complex Networks: Workload , 2007 .
[3] Vivek S. Borkar,et al. Stochastic approximation with 'controlled Markov' noise , 2006, Systems & control letters (Print).
[4] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.
[5] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] S. Meyn. Ergodic theorems for discrete time stochastic systems using a stochastic lyapunov function , 1989 .
[8] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[9] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[10] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[11] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[12] D. Bertsekas,et al. Weighted Bellman Equations and their Applications in Approximate Dynamic Programming ∗ , 2012 .
[13] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[14] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[15] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[16] Dimitri P. Bertsekas,et al. Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..
[17] R. S. Randhawa,et al. Combining importance sampling and temporal difference control variates to simulate Markov Chains , 2004, TOMC.
[18] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[19] Dimitri P. Bertsekas,et al. Temporal Difference Methods for General Projected Equations , 2011, IEEE Transactions on Automatic Control.
[20] R. Cooke. Real and Complex Analysis , 2011 .
[21] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[22] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[23] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[24] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[25] Donald L. Iglehart,et al. Importance sampling for stochastic simulations , 1989 .
[26] Aarnout Brombacher,et al. Probability... , 2009, Qual. Reliab. Eng. Int..
[27] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[28] Zhi-Qiang Liu,et al. Preconditioned temporal difference learning , 2008, ICML '08.
[29] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[30] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[31] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[32] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[33] D. Bertsekas. Projected Equations, Variational Inequalities, and Temporal Difference Methods , 2009 .
[34] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[35] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[36] Vivek S. Borkar,et al. Adaptive Importance Sampling Technique for Markov Chains Using Stochastic Approximation , 2006, Oper. Res..
[37] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[38] Dudley,et al. Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .