Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift
暂无分享,去创建一个
[1] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[2] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[3] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[4] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[5] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.
[6] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.
[7] Long Ji Lin,et al. Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.
[8] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[9] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[10] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[11] Michael Bowling,et al. Dual Representations for Dynamic Programming , 2008 .
[12] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[13] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[14] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[17] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[18] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[19] Etienne Barnard,et al. Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..
[20] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[21] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[22] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[23] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[24] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[25] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.