Data-efficient Hindsight Off-policy Option Learning
暂无分享,去创建一个
Martin A. Riedmiller | N. Heess | Roland Hafner | T. Lampe | Dhruva Tirumala | Markus Wulfmeier | A. Abdolmaleki | Michael Neunert | Tim Hertweck | Noah Siegel | Dushyant Rao | Thomas Lampe
[1] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.
[2] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[3] Martin A. Riedmiller,et al. Regularized Hierarchical Policies for Compositional Transfer in Robotics , 2019, ArXiv.
[4] Ion Stoica,et al. Multi-Level Discovery of Deep Options , 2017, ArXiv.
[5] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[6] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[7] Kate Saenko,et al. Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.
[8] Pieter Abbeel,et al. Sub-policy Adaptation for Hierarchical Reinforcement Learning , 2019, ICLR.
[9] Doina Precup,et al. The Termination Critic , 2019, AISTATS.
[10] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[11] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.
[12] Doina Precup,et al. Off-policy Learning with Options and Recognizers , 2005, NIPS.
[13] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[14] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[15] Martin A. Riedmiller,et al. Compositional Transfer in Hierarchical Reinforcement Learning , 2019, Robotics: Science and Systems.
[16] C. Bishop. Mixture density networks , 1994 .
[17] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.
[18] Gerald Tesauro,et al. Learning Abstract Options , 2018, NeurIPS.
[19] Yee Whye Teh,et al. Information asymmetry in KL-regularized RL , 2019, ICLR.
[20] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.
[21] Shimon Whiteson,et al. DAC: The Double Actor-Critic Architecture for Learning Options , 2019, NeurIPS.
[22] Yuval Tassa,et al. Relative Entropy Regularized Policy Iteration , 2018, ArXiv.
[23] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[24] Shimon Whiteson,et al. TACO: Learning Task Decomposition via Temporal Alignment for Control , 2018, ICML.
[25] Yee Whye Teh,et al. Exploiting Hierarchy for Learning and Transfer in KL-regularized RL , 2019, ArXiv.
[26] Alejandro Agostini,et al. Reinforcement Learning with a Gaussian mixture model , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).
[27] Joelle Pineau,et al. An Inference-Based Policy Gradient Method for Learning Options , 2018, ICML.
[28] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.
[29] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[30] Doina Precup,et al. When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.
[31] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[32] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[33] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[34] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[35] Sergey Levine,et al. Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.
[36] Marcin Andrychowicz,et al. Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.
[37] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.
[38] Shimon Whiteson,et al. Multitask Soft Option Learning , 2019, UAI.
[39] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[40] Thomas G. Dietterich,et al. To transfer or not to transfer , 2005, NIPS 2005.
[41] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[42] Yuval Tassa,et al. Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.
[43] Sergey Levine,et al. Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.
[44] Ion Stoica,et al. DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations , 2017, CoRL.
[45] Misha Denil,et al. The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously , 2017, CoRL.
[46] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.