DAC: The Double Actor-Critic Architecture for Learning Options
暂无分享,去创建一个
[1] Shimon Whiteson,et al. Expected Policy Gradients , 2017, AAAI.
[2] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[3] Kate Saenko,et al. Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.
[4] Jan Peters,et al. Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.
[5] Marlos C. Machado,et al. Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.
[6] Hao Chen,et al. ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search , 2018, AAAI.
[7] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.
[8] David Silver,et al. Compositional Planning Using Optimal Option Models , 2012, ICML.
[9] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[10] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.
[11] Arjun Chandra,et al. Efficient Parallel Methods for Deep Reinforcement Learning , 2017, ArXiv.
[12] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[13] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[14] Nahum Shimkin,et al. Unified Inter and Intra Options Learning Using Policy Gradient Methods , 2011, EWRL.
[15] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.
[16] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[17] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[18] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[19] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[20] Joelle Pineau,et al. An Inference-Based Policy Gradient Method for Learning Options , 2018, ICML.
[21] Kate Saenko,et al. Hierarchical Actor-Critic , 2017, ArXiv.
[22] Shimon Whiteson,et al. Generalized Off-Policy Actor-Critic , 2019, NeurIPS.
[23] Doina Precup,et al. Learnings Options End-to-End for Continuous Action Tasks , 2017, ArXiv.
[24] Scott Niekum,et al. Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery , 2011, Lifelong Learning.
[25] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[26] Gerald Tesauro,et al. Learning Abstract Options , 2018, NeurIPS.
[27] Jürgen Schmidhuber,et al. Planning simple trajectories using neural subgoal generators , 1993 .
[28] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[29] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[30] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[31] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[32] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[33] Doina Precup,et al. When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.
[34] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[35] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[36] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[37] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.