VA-learning as a more efficient alternative to Q-learning
暂无分享,去创建一个
[1] B. Schölkopf,et al. Direct Advantage Estimation , 2021, NeurIPS.
[2] Marc G. Bellemare,et al. Deep Reinforcement Learning at the Edge of the Statistical Precipice , 2021, NeurIPS.
[3] Dongqi Han,et al. Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning , 2019, ArXiv.
[4] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[5] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[6] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.
[7] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[8] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[9] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[10] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[11] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[12] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[13] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[14] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Michael I. Jordan,et al. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms , 1994, Neural Computation.