暂无分享,去创建一个
[1] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.
[2] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[3] Doina Precup,et al. On Efficiency in Hierarchical Reinforcement Learning , 2020, NeurIPS.
[4] Kenneth O. Stanley,et al. First return then explore , 2021, Nature.
[5] Kate Saenko,et al. Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.
[6] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[7] Sridhar Mahadevan,et al. Proto-value functions: developmental reinforcement learning , 2005, ICML.
[8] Alessandro Lazaric,et al. Exploration – Exploitation in MDPs with Options , 2016 .
[9] Shie Mannor,et al. Time-Regularized Interrupting Options (TRIO) , 2014, ICML.
[10] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[11] Zeb Kurth-Nelson,et al. Alchemy: A structured task distribution for meta-reinforcement learning , 2021, ArXiv.
[12] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[13] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[14] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[15] Michael I. Jordan,et al. On the Theory of Transfer Learning: The Importance of Task Diversity , 2020, NeurIPS.
[16] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.
[17] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.
[18] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[19] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[20] Michal Valko,et al. Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited , 2021, ALT.
[21] Sergey Levine,et al. Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.
[22] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.
[23] Sergey Levine,et al. Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.
[24] Sergey Levine,et al. Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? , 2019, ArXiv.
[25] Marlos C. Machado,et al. Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.
[26] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.
[27] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.
[28] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.
[29] Sham M. Kakade,et al. Few-Shot Learning via Learning the Representation, Provably , 2020, ICLR.
[30] Sergey Levine,et al. Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.
[31] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[32] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.
[33] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[34] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.