暂无分享,去创建一个
Matteo Hessel | Florian Strub | Joseph Modayil | Hado van Hasselt | Yotam Doron | Nicolas Sonnerat | Joseph Modayil | Matteo Hessel | H. V. Hasselt | Yotam Doron | Nicolas Sonnerat | Florian Strub
[1] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[2] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[3] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[4] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[5] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[6] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[7] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[8] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[9] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[10] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[11] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[13] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[14] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[16] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[17] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[18] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[19] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[20] Hado van Hasselt,et al. Reinforcement Learning in Continuous State and Action Spaces , 2012, Reinforcement Learning.
[21] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[22] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[23] R. Sutton. On The Virtues of Linear Learning and Trajectory Distributions , 2007 .
[24] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[25] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[26] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[27] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[28] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[30] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[31] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[32] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[33] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..