暂无分享,去创建一个
Marc G. Bellemare | Rémi Munos | Anna Harutyunyan | Tom Stepleton | R. Munos | A. Harutyunyan | T. Stepleton
[1] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[2] Richard S. Sutton,et al. Off-policy learning based on weighted importance sampling with linear computational complexity , 2015, UAI.
[3] H. Kahn,et al. Methods of Reducing Sample Size in Monte Carlo Computations , 1953, Oper. Res..
[4] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[6] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[7] Richard S. Sutton,et al. True online TD(λ) , 2014, ICML 2014.
[8] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[9] Hado Philip van Hasselt,et al. Insights in reinforcement rearning : formal analysis and empirical evaluation of temporal-difference learning algorithms , 2011 .
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[12] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[13] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[14] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[16] R. Sutton,et al. A new Q ( � ) with interim forward view and Monte Carlo equivalence , 2014 .
[17] Doina Precup,et al. A new Q(lambda) with interim forward view and Monte Carlo equivalence , 2014, ICML.
[18] Michael Kearns,et al. Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.
[19] Peter Dayan,et al. Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.
[20] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[21] Shie Mannor,et al. Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis , 2015, AAAI.
[22] Hado van Hasselt,et al. Insights in reinforcement ;learning: Formal analysis end empirical evaluation of temporal-difference learning , 2010 .
[23] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[24] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[25] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[26] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[27] Martha White,et al. Emphatic Temporal-Difference Learning , 2015, ArXiv.
[28] Richard S. Sutton,et al. True Online TD(lambda) , 2014, ICML.