暂无分享,去创建一个
David Budden | David Silver | Matteo Hessel | Hado van Hasselt | Gabriel Barth-Maron | Dan Horgan | John Quan | Dan Horgan | D. Budden | D. Silver | Gabriel Barth-Maron | Matteo Hessel | H. V. Hasselt | John Quan | David Silver
[1] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .
[2] R. Mazo. On the theory of brownian motion , 1973 .
[3] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .
[4] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
[5] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[6] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[8] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[9] Geoffrey E. Hinton,et al. To recognize shapes, first learn to generate images. , 2007, Progress in brain research.
[10] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[11] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[12] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[13] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[14] David Silver,et al. Concurrent Reinforcement Learning from Customer Interactions , 2013, ICML.
[15] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[16] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[17] Frank Hutter,et al. Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Yoshua Bengio,et al. Variance Reduction in SGD by Distributed Importance Sampling , 2015, ArXiv.
[20] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[21] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[22] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[23] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[24] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[25] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[26] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[27] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[28] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[29] Stephen Tyree,et al. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.
[30] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[31] Arjun Chandra,et al. Efficient Parallel Methods for Deep Reinforcement Learning , 2017, ArXiv.
[32] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[33] Tom Schaul,et al. Learning from Demonstrations for Real World Reinforcement Learning , 2017, ArXiv.
[34] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[35] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[36] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.