论文信息 - Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? - 字舞流文

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

Hierarchical reinforcement learning has demonstrated significant success at solving difficult reinforcement learning (RL) tasks. Previous works have motivated the use of hierarchy by appealing to a number of intuitive benefits, including learning over temporally extended transitions, exploring over temporally extended periods, and training and exploring in a more semantically meaningful action space, among others. However, in fully observed, Markovian settings, it is not immediately clear why hierarchical RL should provide benefits over standard "shallow" RL architectures. In this work, we isolate and evaluate the claimed benefits of hierarchical RL on a suite of tasks encompassing locomotion, navigation, and manipulation. Surprisingly, we find that most of the observed benefits of hierarchy can be attributed to improved exploration, as opposed to easier policy learning or imposed hierarchical structures. Given this insight, we present exploration techniques inspired by hierarchy that achieve performance competitive with hierarchical RL while at the same time being much simpler to use and implement.

Sergey Levine | Honglak Lee | Ofir Nachum | Shixiang Gu | Xingyu Lu | Haoran Tang

[1] Marlos C. Machado,et al. Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.

[2] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[3] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.

[4] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[5] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[6] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[7] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.

[8] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[9] Pierre-Yves Oudeyer,et al. Intrinsically motivated goal exploration for active motor learning in robots: A case study , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.

[11] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[12] Jürgen Schmidhuber,et al. Planning simple trajectories using neural subgoal generators , 1993 .

[13] Peter Stone,et al. The utility of temporal abstraction in reinforcement learning , 2008, AAMAS.

[14] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[15] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[16] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[17] M. Frank,et al. Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from fMRI. , 2012, Cerebral cortex.

[18] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[19] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[20] Kate Saenko,et al. Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.

[21] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[22] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[23] M. Botvinick. Hierarchical reinforcement learning and decision making , 2012, Current Opinion in Neurobiology.

[24] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[25] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[26] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[27] Kate Saenko,et al. Hierarchical Actor-Critic , 2017, ArXiv.

[28] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[29] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[30] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.

[31] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[32] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[33] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[34] Vikash Kumar,et al. Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real , 2019, CoRL.

[35] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[36] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[37] Sergey Levine,et al. Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[38] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.

[39] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.

[40] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..