暂无分享,去创建一个
Doina Precup | Pierre-Luc Bacon | Martin Klissarov | Jean Harb | Doina Precup | Pierre-Luc Bacon | J. Harb | Martin Klissarov
[1] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[2] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[3] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[4] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[5] Herbert A. Simon,et al. The Sciences of the Artificial , 1970 .
[6] Gregory Dudek,et al. Benchmark Environments for Multitask Learning in Continuous Domains , 2017, ArXiv.
[7] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[8] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[9] Doina Precup,et al. When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.
[10] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[11] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[12] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.