暂无分享,去创建一个
Yuval Tassa | David Silver | Daan Wierstra | Nicolas Heess | Jonathan J. Hunt | Alexander Pritzel | Tom Erez | Timothy P. Lillicrap | T. Lillicrap | D. Silver | N. Heess | T. Erez | Yuval Tassa | Daan Wierstra | A. Pritzel | David Silver | Tom Erez | D. Wierstra
[1] R. Mazo. On the theory of brownian motion , 1973 .
[2] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[3] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[4] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..
[5] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.
[6] P. Dayan,et al. States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.
[7] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[8] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[9] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[10] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[12] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[13] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[14] Ajay Kumar Tanwani,et al. Autonomous reinforcement learning with experience replay. , 2013, Neural networks : the official journal of the International Neural Network Society.
[15] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[16] Jürgen Schmidhuber,et al. Online Evolution of Deep Convolutional Network for Vision-Based Reinforcement Learning , 2014, SAB.
[17] Jürgen Schmidhuber,et al. Evolving deep unsupervised convolutional networks for vision-based reinforcement learning , 2014, GECCO.
[18] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[19] Thomas B. Schön,et al. From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.
[20] Muhammad Ghifary,et al. Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies , 2015, ArXiv.
[21] Pieter Abbeel,et al. Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.
[22] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[23] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[24] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[26] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.
[27] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[28] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .
[29] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[30] Omer Levy,et al. Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .