Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives

Reinforcement learning agents that operate in diverse and complex environments can benefit from the structured decomposition of their behavior. Often, this is addressed in the context of hierarchical reinforcement learning, where the aim is to decompose a policy into lower-level primitives or options, and a higher-level meta-policy that triggers the appropriate behaviors for a given situation. However, the meta-policy must still produce appropriate decisions in all states. In this work, we propose a policy design that decomposes into primitives, similarly to hierarchical reinforcement learning, but without a high-level meta-policy. Instead, each primitive can decide for themselves whether they wish to act in the current state. We use an information-theoretic mechanism for enabling this decentralized decision: each primitive chooses how much information it needs about the current state to make a decision and the primitive that requests the most information about the current state acts in the world. The primitives are regularized to use as little information as possible, which leads to natural competition and specialization. We experimentally demonstrate that this policy architecture improves over both flat and hierarchical policies in terms of generalization.

[1]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[2]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[3]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[4]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[5]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[6]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[9]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[10]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Stefano Soatto,et al.  Information Dropout: learning optimal representations through noise , 2017, ArXiv.

[15]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[16]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[17]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[18]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[19]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[20]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[21]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[22]  Li Fei-Fei,et al.  Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[24]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[25]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[26]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[27]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[28]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[29]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[30]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[31]  J. Hodgins,et al.  Learning to Schedule Control Fragments for Physics-Based Characters Using Deep Q-Learning , 2017, ACM Trans. Graph..

[32]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[33]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[34]  Sergey Levine,et al.  Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.

[35]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[36]  Mykel J. Kochenderfer,et al.  Model Primitive Hierarchical Lifelong Reinforcement Learning , 2019, AAMAS.

[37]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[38]  Yee Whye Teh,et al.  Neural probabilistic motor primitives for humanoid control , 2018, ICLR.

[39]  Nicolas Heess,et al.  Hierarchical visuomotor control of humanoids , 2018, ICLR.

[40]  Ignacio Cases,et al.  Routing Networks and the Challenges of Modular and Compositional Computation , 2019, ArXiv.