暂无分享,去创建一个
Shane Legg | Rémi Munos | Iain Dunning | Karen Simonyan | Koray Kavukcuoglu | Tim Harley | Volodymyr Mnih | Yotam Doron | Hubert Soyer | Tom Ward | Lasse Espeholt | Vlad Firoiu | K. Kavukcuoglu | R. Munos | S. Legg | Volodymyr Mnih | Hubert Soyer | K. Simonyan | Tim Harley | Lasse Espeholt | Yotam Doron | Vlad Firoiu | Iain Dunning | Tom Ward | L. Espeholt
[1] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[3] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[4] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[5] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[6] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[7] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.
[8] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[9] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[10] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[11] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[13] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[14] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[15] Marc G. Bellemare,et al. Q(λ) with Off-Policy Corrections , 2016, ALT.
[16] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[17] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[18] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[19] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[20] Phil Blunsom,et al. Optimizing Performance of Recurrent Neural Networks on GPUs , 2016, ArXiv.
[21] Stephen Tyree,et al. GA3C: GPU-based A3C for Deep Reinforcement Learning , 2016, ArXiv.
[22] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[23] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[24] Demis Hassabis,et al. Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.
[25] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[26] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[27] Koray Kavukcuoglu,et al. Combining policy gradient and Q-learning , 2016, ICLR.
[28] Martín Abadi,et al. A computational model for TensorFlow: an introduction , 2017, MAPL@PLDI.
[29] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[30] Arjun Chandra,et al. Efficient Parallel Methods for Deep Reinforcement Learning , 2017, ArXiv.
[31] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[32] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[33] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[34] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[35] Henryk Michalewski,et al. Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes , 2018, ISC.
[36] Shane Legg,et al. Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents , 2018, ArXiv.
[37] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[38] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.