Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies
暂无分享,去创建一个
Haim Kaplan | Avinatan Hassidim | Yishay Mansour | Tom Zahavy | Y. Mansour | Avinatan Hassidim | Haim Kaplan | Tom Zahavy | A. Hassidim
[1] David S. Johnson,et al. The Traveling Salesman Problem: A Case Study in Local Optimization , 2008 .
[2] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[3] Romain Laroche,et al. Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.
[4] Asaf Levin,et al. Discounted Reward TSP , 2016, Algorithmica.
[5] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[6] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.
[7] Doina Precup,et al. The Option Keyboard: Combining Skills in Reinforcement Learning , 2021, NeurIPS.
[8] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.
[9] David R. Karger,et al. Approximation algorithms for orienteering and discounted-reward TSP , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..
[10] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.
[11] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).
[12] Tom Schaul,et al. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement , 2018, ICML.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Honglak Lee,et al. Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.
[15] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[16] Jonas Karlsson,et al. Learning to Solve Multiple Goals , 1997 .
[17] Sergiu Hart,et al. The Absent-Minded Driver , 1996, TARK.
[18] Tom Schaul,et al. Universal Successor Features Approximators , 2018, ICLR.
[19] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[20] Christopher Burgess,et al. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.
[21] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[22] Stuart J. Russell,et al. Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.
[23] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[24] Mark Humphreys,et al. Action selection methods using reinforcement learning , 1997 .
[25] Haim Kaplan,et al. Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies , 2018, ArXiv.
[26] Andrew Chi-Chih Yao,et al. Probabilistic computations: Toward a unified measure of complexity , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).
[27] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[28] Dana H. Ballard,et al. Multiple-Goal Reinforcement Learning with Modular Sarsa(0) , 2003, IJCAI.
[29] M. Held,et al. A dynamic programming approach to sequencing problems , 1962, ACM National Meeting.
[30] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
[31] Richard Bellman,et al. Dynamic Programming Treatment of the Travelling Salesman Problem , 1962, JACM.
[32] Shie Mannor,et al. Approximate Value Iteration with Temporally Extended Actions , 2015, J. Artif. Intell. Res..
[33] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[34] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[35] Daniel J. Rosenkrantz,et al. An Analysis of Several Heuristics for the Traveling Salesman Problem , 1977, SIAM J. Comput..
[36] David Warde-Farley,et al. Fast Task Inference with Variational Intrinsic Successor Features , 2019, ICLR.
[37] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..