Variation-resistant Q-learning: Controlling and Utilizing Estimation Bias in Reinforcement Learning for Better Performance
暂无分享,去创建一个
[1] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[2] Mykel J. Kochenderfer,et al. Weighted Double Q-learning , 2017, IJCAI.
[3] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[4] Warren B. Powell,et al. Bias-corrected Q-learning to control max-operator bias in Q-learning , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[5] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[6] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[7] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.
[8] Marcello Restelli,et al. Estimating Maximum Expected Value through Gaussian Approximation , 2016, ICML.
[9] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[10] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[11] Martha White,et al. Maxmin Q-learning: Controlling the Estimation Bias of Q-learning , 2020, ICLR.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[14] Richard Bellman,et al. Dynamic Programming and Stochastic Control Processes , 1958, Inf. Control..
[15] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[16] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.