暂无分享,去创建一个
David Silver | Nicolas Heess | Jonathan J. Hunt | Timothy P. Lillicrap | T. Lillicrap | D. Silver | N. Heess | David Silver
[1] Tom M. Mitchell,et al. Reinforcement learning with hidden states , 1993 .
[2] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[3] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[4] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[5] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[6] Bram Bakker,et al. Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.
[7] R. D'Hooge,et al. Applications of the Morris water maze in the study of learning and memory , 2001, Brain Research Reviews.
[8] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.
[9] Geoffrey E. Hinton,et al. Reinforcement learning for factored Markov decision processes , 2002 .
[10] Jürgen Schmidhuber,et al. Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.
[11] Jürgen Schmidhuber,et al. Policy Gradient Critics , 2007, ECML.
[12] Katsunari Shibata,et al. Contextual Behaviors and Internal Representations Acquired by Reinforcement Learning with a Recurrent Neural Network in a Continuous State and Action Space Task , 2008, ICONIP.
[13] F.L. Lewis,et al. Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.
[14] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[15] Guy Shani,et al. A survey of point-based POMDP solvers , 2013, Autonomous Agents and Multi-Agent Systems.
[16] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[18] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[19] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[20] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[21] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[22] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[23] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[24] Sergey Levine,et al. Policy Learning with Continuous Memory States for Partially Observed Robotic Control , 2015, ArXiv.
[25] Muhammad Ghifary,et al. Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies , 2015, ArXiv.
[26] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[27] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[30] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[31] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[32] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[33] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.