暂无分享,去创建一个
Richard S. Sutton | J. Fernando Hernandez-Garcia | R. Sutton | J. Hernandez-Garcia | J. F. Hernandez-Garcia
[1] Marc G. Bellemare,et al. The Reactor: A Sample-Efficient Actor-Critic Architecture , 2017, ArXiv.
[2] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[3] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[4] Michael Kearns,et al. Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.
[5] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[6] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .
[7] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[8] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[9] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[10] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[13] Richard S. Sutton,et al. Multi-step Reinforcement Learning: A Unifying Algorithm , 2017, AAAI.