暂无分享,去创建一个
Pieter Abbeel | Xi Chen | John Schulman | Jonathan Ho | Kevin Frans | J. Schulman | P. Abbeel | Xi Chen | Jonathan Ho | Kevin Frans | John Schulman
[1] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.
[2] Andrew G. Barto,et al. Motor primitive discovery , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[3] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[4] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[5] Jan Peters,et al. Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.
[6] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[7] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[8] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[9] Matthew E. Taylor,et al. Autonomous Extracting a Hierarchical Structure of Tasks in Reinforcement Learning and Multi-task Reinforcement Learning , 2017, ArXiv.
[10] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[11] Andrew G. Barto,et al. Conjugate Markov Decision Processes , 2011, ICML.
[12] Joelle Pineau,et al. OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning , 2017, AAAI.
[13] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[15] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[16] Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.
[17] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[18] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[19] Yuval Tassa,et al. Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.
[20] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[21] Pieter Abbeel,et al. Meta-Learning with Temporal Convolutions , 2017, ArXiv.
[22] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[23] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[24] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[25] Philip S. Thomas,et al. Policy Gradient Coagent Networks , 2011, NIPS.