暂无分享,去创建一个
Alex Graves | Koray Kavukcuoglu | David Silver | Tim Harley | Mehdi Mirza | Volodymyr Mnih | Timothy P. Lillicrap | Adrià Puigdomènech Badia | T. Lillicrap | K. Kavukcuoglu | D. Silver | Volodymyr Mnih | A. Graves | M. Mirza | Tim Harley | Mehdi Mirza | David Silver | Alex Graves
[1] Dimitri Bertsekas,et al. Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.
[2] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[3] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[4] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.
[5] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[6] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[7] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[8] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[9] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[10] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[11] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[12] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[13] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[14] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Christos Dimitrakakis,et al. TORCS, The Open Racing Car Simulator , 2005 .
[17] Daniel Kudenko,et al. Parallel reinforcement learning with linear function approximation , 2007, AAMAS '07.
[18] Daniel Kudenko,et al. Parallel Reinforcement Learning with Linear Function Approximation , 2007, Adaptive Agents and Multi-Agents Systems.
[19] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[20] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[21] Dale Schuurmans,et al. MapReduce for Parallel Reinforcement Learning , 2011, EWRL.
[22] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[23] Patrick M. Pilarski,et al. Model-Free reinforcement learning with continuous action in practice , 2012, 2012 American Control Conference (ACC).
[24] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[25] Christos Dimitrakakis,et al. TORCS, The Open Racing Car Simulator, v1.3.5 , 2013 .
[26] Jürgen Schmidhuber,et al. Evolving deep unsupervised convolutional networks for vision-based reinforcement learning , 2014, GECCO.
[27] Hao Yi Ong,et al. Distributed Deep Q-Learning , 2015, ArXiv.
[28] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[30] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[31] Peter Kulchyski. and , 2015 .
[32] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[33] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[34] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[35] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[36] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[37] Patrick M. Pilarski,et al. True Online Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[38] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[39] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[40] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[41] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.