On Convergence of Emphatic Temporal-Difference Learning
暂无分享,去创建一个
[1] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[2] Sean P. Meyn. Control Techniques for Complex Networks: Workload , 2007 .
[3] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[4] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[5] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[6] J. Gillis,et al. Matrix Iterative Analysis , 1961 .
[7] P. Billingsley,et al. Convergence of Probability Measures , 1969 .
[8] Peter Stone,et al. Reinforcement learning , 2019, Scholarpedia.
[9] Martha White,et al. Emphatic Temporal-Difference Learning , 2015, ArXiv.
[10] W. Rudin. Real and complex analysis , 1968 .
[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[12] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[13] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[14] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[15] J. Neveu,et al. Discrete Parameter Martingales , 1975 .
[16] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[17] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[18] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[19] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[20] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[21] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[22] Donald L. Iglehart,et al. Importance sampling for stochastic simulations , 1989 .
[23] D. Bertsekas,et al. Weighted Bellman Equations and their Applications in Approximate Dynamic Programming ∗ , 2012 .
[24] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[25] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[26] R. S. Randhawa,et al. Combining importance sampling and temporal difference control variates to simulate Markov Chains , 2004, TOMC.
[27] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[28] Dudley,et al. Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .
[29] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[30] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[31] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[32] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.
[33] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[34] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .